hey there, i am having a significant difference in accuracy when i test my model on X_test (30% of the train.csv that i am using for accuracy prediction) and test.csv. Like when i used X_test to predict the model's accuracy, its giving of total 0.86 and when i use the same model to predict churn for test.csv and upload the solution, its giving an accuracy of 0.5. Is it normal or its an error indication cause i have cross checked multiple times?
Hello,
The LB evaluation metric is AUC. Could you check the AUC of your local validation set? Note that it's possible to have a very high accuracy but low (~0.5) AUC.
Also make sure your predictions are probabilities between 0 and 1, not absolute 0,1.
thankyou so much, i was not using Area Under the Curve as the evaluation matrix and also using churn as absolute 0 & 1. thanks alot.
Hi ensure you are predicting probabilities instead of absolute values. for most sklearn algorithms, switching 'predict(test)' to 'predict_proba(test)[:,1]' should do the trick
thats exactly what solved my problem. thankyou