💰 Challenge Chat: sample_submission clarificatio...

Mobile Money and Financial Inclusion in Tanzania Challenge

Helping Tanzania, United Republic of

$2 250 USD

Completed (almost 7 years ago)

Skills you will learn

Prediction

718 joined

162 active

Info Data Chat Leaderboard

Start

Mar 26, 19

Jun 30, 19

Reveal

Jul 01, 19

davidquartey

sample_submission clarification

Data · 20 Apr 2019, 21:26 · 3

Hello guys,

Need clarification

no_financial_services | other_only | mm_only | mm_plus

0.5423 | 0.9987 | 0.12 | 0.0123

If the numbers represent probabilies, shouldnt they add up to 1? Or I'm I missing something?

Discussion 3 answers

jabbott

Had the same thought! I suspect it's just a bad example

21 Apr 2019, 07:58

Upvotes 0

EmileArthur

Hi am new in this challenge. Can you please explain the process of submission

23 Apr 2019, 03:05

Upvotes 0

Johnowhitaker

"Your goal is to accurately classify each individual into four mutually exclusive categories..."

So an ideal submission would be 0|1|0|0. But predictions can have uncertainty. And for multi-class classification, the output of a model is often a predicted probability for each class. Depending on the model, these may be calculated independently for each class and thus won't be guaranteed to sum to one.

As an example, scikit-learn has a predict_proba() method for many classifiers. This isn't perfect, especially with some tree-based models. A good post on improving this with more info: https://scikit-learn.org/stable/modules/calibration.html

As for why you may want to submit these predicted probabilities as opposed to just picking the most likely class and submitting a definite prediction (0|0|0|1), consider the goal of optimising score / minimizing loss. In cases where the model is more certain, we want to be as close as possible. In cases with uncertainty, predicting a value closer to 0.5 is a way of hedging your bets - a penalty will be incurred either way, but it will be lower in cases where the model made a wrong prediction.

It would be interesting to compare the two approaches - does someone feel like submitting their predictions as probabilities and then a second submission with the highest prob mapped to 1, the rest to 0?

23 Apr 2019, 08:07

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status