Mobile Money and Financial Inclusion in Tanzania Challenge
Cash and prizes worth $2,250 USD
Who is most likely to use mobile money? And who is most likely to use other financial services?
26 March–30 June 2019 23:59
462 data scientists enrolled, 163 on the leaderboard
sample_submission clarification
published 20 Apr 2019, 21:26

Hello guys,

Need clarification

no_financial_services | other_only | mm_only | mm_plus

0.5423 | 0.9987 | 0.12 | 0.0123

If the numbers represent probabilies, shouldnt they add up to 1? Or I'm I missing something?

Had the same thought! I suspect it's just a bad example

Hi am new in this challenge. Can you please explain the process of submission

"Your goal is to accurately classify each individual into four mutually exclusive categories..."

So an ideal submission would be 0|1|0|0. But predictions can have uncertainty. And for multi-class classification, the output of a model is often a predicted probability for each class. Depending on the model, these may be calculated independently for each class and thus won't be guaranteed to sum to one.

As an example, scikit-learn has a predict_proba() method for many classifiers. This isn't perfect, especially with some tree-based models. A good post on improving this with more info: https://scikit-learn.org/stable/modules/calibration.html

As for why you may want to submit these predicted probabilities as opposed to just picking the most likely class and submitting a definite prediction (0|0|0|1), consider the goal of optimising score / minimizing loss. In cases where the model is more certain, we want to be as close as possible. In cases with uncertainty, predicting a value closer to 0.5 is a way of hedging your bets - a penalty will be incurred either way, but it will be lower in cases where the model made a wrong prediction.

It would be interesting to compare the two approaches - does someone feel like submitting their predictions as probabilities and then a second submission with the highest prob mapped to 1, the rest to 0?