Improve accuracy when adding new data to machine learning model

ML model having recall of .97 and precision of 93 and accuracy of 95 on test data but in completely new data it doesn't give good results. What could be the possible reason? – From Reddit

I have seen this too many times. Your model looks perfect with high scores. And somewhat low inference time. But you add new data to test how it would fair. But the results a negligible at best. So you start to wonder what’s wrong with my model. Or maybe it’s my data.

This is a case of overfitting. When the model overly learns the data from its training phase.

To fix this. You want to make sure your data is set up correctly. So make your dataset split into testing data and training data. And depending on your preference add a validation set as well.

Now start training your model using the training data. Which should learn enough to develop a general pattern of the data.

Now check using test data. If your test data is good, then half of the problem is solved. You then want to use the validation dataset to help tune your hyperparameters.

If the new data is giving poor results. Then you may want to find any mistakes in the model or data.

Advice from this article

 

First things first. Simplify your model. Find the simplest model that can deal with your problem.

Second, turn off any extra features like batch normalization or dropout

Third, verify your input data is correct

 

On a separate note. Make sure your new data you're adding to model is correct as well. Sometimes we can do minor mistakes like forgetting to do pre-processing correctly when using a separate piece of data.

Doing this should remove any bits of your model that are adversely affecting your results. Checking the input data and the test data. Is a simple double-check. As an error in the data can go unnoticed. And maybe affecting your model.  Doing this gives you are a chance to spot any of those errors.

Hopefully by doing many of the steps above. The issue should be fixed. If not go to the article I linked above. And go through all the steps in that article.