Financial Inclusion in Africa
Knowledge/Pre-Qualification to AI Hack Tunisia 2019
Who is most likely to have a bank account?
1 August–15 September 2019 23:59
1081 data scientists enrolled, 606 on the leaderboard
Metric Being Used
published 1 Sep 2019, 17:57

Hi There, Since the data is imbalanced, the metric should be used is not the accuracy or error rate, but instead F1-Macro or ROC-AUC. This way we will compete for better models.

edited ~12 hours later

Please zindi kindly look into this. ROC-AUC will be a better evaluation metrics to know how good our model is.

True ROC-AUC would be a better choice if this was a model that had to be put in real-life use, but isn't it a bit late for a metric change?

we can balance the data no ?

balancing the data by oversampling/undersampling or just weighting the classes properly will give a bad score using the error rate metric. If we were being tested on f1 score , balancing would be a good idea.

So what should we do then?

replying to souhagaa
edited ~4 hours later

classify data points as they are now. You will miss a lot of '1's compared to their total number in the dataset, and less '0' in %. Balancing data or weighting is a business problem not a data science one. If the cost of missclassifying ones as zeros is bearable for a business, then a model like the ones on the leaderboard right now is viable for production. I hope it's clear enough. If this was a cancer detection challenge, then modelling this way would be wrong.