☎️ AI in Focus: Calculating AUC

Expresso Churn Prediction Challenge

Helping Senegal

$1 000 USD

Completed (over 4 years ago)

Skills you will learn

Classification

Prediction

1402 joined

437 active

Info Data Chat Leaderboard

Start

Aug 27, 21

Nov 28, 21

Reveal

Nov 28, 21

MahadAhmed700

National university of science and technology

Calculating AUC

Help · 5 Nov 2021, 13:59 · 2

hey there, i am using the below code get the AUC measure

1. fpr, tpr, thresholds = metrics.roc_curve(y_test, probs)

2. print(auc(fpr, tpr))

but i am getting a huge difference in AUC when i predict probalities on y_test (30% from trainning data that is used for predicting AUC) and the test.csv file when i upload the submission. like 0.9 when predict probalities on trainning file and 0.67 when predict on test.csv file on submitted solution. what does the problem seems to be?

Discussion 2 answers

tobi_ace

Hi there,

The discrepancy between your test results and the actual submission means your model is not generalising properly and might be due to many factors.

One major reason could be overfitting. If your model is overfitting to the training data, it might have a higher accuracy on the subset of the training data you are using to test the model.

Also ensure there is no data leakage in your training and test splits. That is, all of the data you are testing on has never been seen by the model. I suggest the train_test_split function from sklearn for clean splits

5 Nov 2021, 14:34 (edited less than a minute later)

Upvotes 0

MahadAhmed700

National university of science and technology

I have used train_test_split to break the dataset to train and test but seems that overfitting might be because of some other reasons.

replied to tobi_ace5 Nov 2021, 15:47

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status