Primary competition visual

Expresso Churn Prediction Challenge

Helping Senegal
$1 000 USD
Completed (over 4 years ago)
Classification
Prediction
1378 joined
437 active
Starti
Aug 27, 21
Closei
Nov 28, 21
Reveali
Nov 28, 21
User avatar
National university of science and technology
Calculating AUC
Help · 5 Nov 2021, 13:59 · 2

hey there, i am using the below code get the AUC measure

1. fpr, tpr, thresholds = metrics.roc_curve(y_test, probs)

2. print(auc(fpr, tpr))

but i am getting a huge difference in AUC when i predict probalities on y_test (30% from trainning data that is used for predicting AUC) and the test.csv file when i upload the submission. like 0.9 when predict probalities on trainning file and 0.67 when predict on test.csv file on submitted solution. what does the problem seems to be?

Discussion 2 answers

Hi there,

The discrepancy between your test results and the actual submission means your model is not generalising properly and might be due to many factors.

One major reason could be overfitting. If your model is overfitting to the training data, it might have a higher accuracy on the subset of the training data you are using to test the model.

Also ensure there is no data leakage in your training and test splits. That is, all of the data you are testing on has never been seen by the model. I suggest the train_test_split function from sklearn for clean splits

User avatar
National university of science and technology

I have used train_test_split to break the dataset to train and test but seems that overfitting might be because of some other reasons.