How to Improve Accuracy Of Machine Learning Model in Python
Machine Learning model accuracy is the measurement to decide which model is best to do prediction. Machine Learning model accuracy is very essential as the cost of error can be huge. So, it’s important to make our model accurate as much as we can.
Methods to enhance the accuracy of our ML model
1. Add more data
We know that our model learns or get trained by using the data we provided. Data is the backbone of our ML model. Like humans, more experiences we get from our life better we make decisions, our model also learns from our data i.e. if we provide more data to our model then the model will learn more and will be able to identify cases more correctly and do predictions more precisely.
2. Find score metric
By using score metric we can check the accuracy of our model. It could be R squared, Adjusted R squared, Confusion Matrix, F1, Recall, Variance, etc.
For example, by using ‘r2_score’ in linear regression model you can see your model performance. The best possible score is 1.0 and it can be negative so by this we can say that if you have a score of around 0.65 or above, then your model is quite good.
3. Feature Selection
It is an important factor for improving the accuracy of our model. Use only meaningful feature i.e. features with most heavily impact decisions made by the algorithm. You can select these features manually or by other techniques such as Permutation Feature Importance (PFI).
In the above image, the correlation of various features of our data is shown through a heatmap in python. Here we can observe all features dependency and can select only those features with high dependency i.e. we can only select those features whose dependency is higher than 0.55.
Cross-validation is a statistical method that splits the data into several partitions and trains multiple algorithms on these partitions. This method allows us to test our model on some unseen data and if it provides good results then we should use that model.
Few common techniques used for Cross-validation are:
- Train_Test Split approach.
- K-Folds Cross-Validation
5. Treat missing values in data
The presence of missing values in data often reduces the accuracy of our model. So, it’s important to deal with these missing values. In python we have ‘ .isnull( ) ‘ method which returns TRUE if a NULL value is found, otherwise it returns FALSE as shown below in the screenshot:
If you found missing value in your data then you have to remove such values. there are various methods to do this such as impute mean, median or mode values in the case of continuous variables and for categorical variables use a class.
That’s it, hope this will help you!