no_financial_services | other_only | mm_only | mm_plus
0.5423 | 0.9987 | 0.12 | 0.0123
If the numbers represent probabilies, shouldnt they add up to 1? Or I'm I missing something?
Had the same thought! I suspect it's just a bad example
Hi am new in this challenge. Can you please explain the process of submission
"Your goal is to accurately classify each individual into four mutually exclusive categories..."
So an ideal submission would be 0|1|0|0. But predictions can have uncertainty. And for multi-class classification, the output of a model is often a predicted probability for each class. Depending on the model, these may be calculated independently for each class and thus won't be guaranteed to sum to one.
As an example, scikit-learn has a predict_proba() method for many classifiers. This isn't perfect, especially with some tree-based models. A good post on improving this with more info: https://scikit-learn.org/stable/modules/calibration.html
As for why you may want to submit these predicted probabilities as opposed to just picking the most likely class and submitting a definite prediction (0|0|0|1), consider the goal of optimising score / minimizing loss. In cases where the model is more certain, we want to be as close as possible. In cases with uncertainty, predicting a value closer to 0.5 is a way of hedging your bets - a penalty will be incurred either way, but it will be lower in cases where the model made a wrong prediction.
It would be interesting to compare the two approaches - does someone feel like submitting their predictions as probabilities and then a second submission with the highest prob mapped to 1, the rest to 0?