Dear ZINDI and fellow contestants, please I have a question that needs clearing up.
It says that "The goal is to classify the image according to the type of wheat rust that appears most prominently in the image." and the log_loss function is being used as the evaluation metric.
So if there is a model prediction of healthy, leaf_rust,stem_rust [0.21, 0.98, 0.63] is the log_loss function calculated based on the model's dominant class prediction(0.98) and the true dominant class, or based on all the model's predictions against the true predictions?
For example, say my model ouput predictions for a single image for ['healthy','leaf_rust','stem_rust'] are [0.2,0.7,0.1] while the true predictions are [0.8,0.4,0.3], is the log loss calculated as:
a) log_loss(0.8,0.2) #in this case, only the deviation between the model's predicted dominant class and the true dominant class is taken into accout
b)log_loss(0.8,0.2) + log_loss(0.7,0.4) + log_loss(0.3,0.1) #Here, all the predictions for all the classes of the image are taken into account.
Thank you in advance for your swift reply
There is only one true class for each image. The logloss is calculated against your prediction for that class; essentially every prediction is log_loss(1,[your prediction for the ground truth class])
In your example, predictions for ['healthy','leaf_rust','stem_rust'] being [0.2,0.7,0.1]:
if the true class was leaf_rust, it's log_loss of a true prediction of 0.7, which is a loss of 0.3566
if the true class was stem_rush, it's log_loss for a true prediciton of 0.1, which is a loss of 2.302
Thank you for this reply, but I just did a little tweaking of the model predictions in lieu of what you said, and the result I had contradicts your assumption. Could someone else, or perhaps the ZINDI organizers kindly comment on this please?
Keep in mind, probabilities will be scaled so that the total is 1. In a couple of the examples you gave, you have a trio of predictions that adds up to far greater than 1, such as [0.21, 0.98, 0.63]. You can't be 98% confident of one class and 63% confident of another. Typical logloss measurement will conveniently scale each of those probabilities down, then go about measuring against the scaled prediction of the true class. Similarly, a cap is ordinarily used to prohibit infinite loss on an incorrect 0% probability.
And I didn't mention, but it's the average of all individual losses, whereas in your original question you have a sum. It's a bit hard to follow the examples you have because few of them actually fit inside 100%. But it's neither A nor B. There is no such thing as a dominant class. It's merely whatever the probability you assigned to the true class. For an image of a healthy plant, you'll get the same logloss if you predict ['healthy','leaf_rust','stem_rust'] [0.4, 0.3, 0.3] as you will if you predict [0.4, 0.59, 0.01]. It's logloss of 0.4 (which is ~0.9163).
You're right that Zindi may have an alternative implementation, but this is the standard: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html
from sklearn.metrics import log_loss
log_loss(["a","b","c"],[[0.4,0.35,0.25],[0.4,0.35,0.25],[0.4,0.35,0.25]]) ## 1.1174690724975747
log_loss(["a","b","c"],[[0.4,0.59,0.01],[0.4,0.35,0.25],[0.4,0.35,0.25]]) ## 1.1174690724975747
So the dominant class changed in the first prediction set with no effect on the score, since p(b) isn't used in the calculation for that, as "a" is the answer.
You can work it out if you want, here showing the log of 1.A, 2.B, 3.C probabilities
(-log(0.4) + -log(0.35) + -log(0.25))/3 ## 1.1174690724975747
And just to show that the class matters and use an example where the positive predictions aren't identical, here is the result of the altered class was correct and 1.B, 2.A, 3.C probabilities are used:
log_loss(["b","a","c"],[[0.4,0.59,0.01],[0.4,0.35,0.25],[0.4,0.35,0.25]]) ## 0.9434059450254725
(-log(0.59) + -log(0.4) + -log(0.25))/3 ## 0.9434059450254725
If you consider the 'true' labels to be of the form [0, 1, 0] and your predictions to be [0.3, 0.99, 0.67] for each image. The easiest way to calculate a score is `log_loss(y_true, y_pred)`. You could also do `log_loss(y_true.flatten(), y_pred.flatten())`. Importantly, it's taking into accont all predictions for all classses, not just considering the dominant class.
Given a dataframe 'preds' that looks like the sample submission, and one 'reference' that has the correct classes encoded in the same way, you can use `log_loss(reference[classes], preds[classes])` (provided they're in the same order!)
(Screenshot example: https://pasteboard.co/ITnW7Ad.png)
Thank you very much for the clarification
Thank you for this clarification. I understand it very well now
Please Chrisjay help me to understand. For example, I got [0.1 0.7 0.2] in the softmax results, how can I convert this to log loss?
I don't really understand your question, but I will try to answer by referring to Johnowhitaker post above:
"If you consider the 'true' labels to be of the form [0, 1, 0] and your predictions to be [0.1, 0.7, 0.2] for each image. The easiest way to calculate a score is `log_loss(y_true, y_pred)`. "
So you calculate the log_loss based on your predictions: [0.1 0.7 0.2] and the true predictions(which you don't know, but say for example it's [0, 1, 0]
So in the submission file, what we submit is log_loss or probabilistic results from model prediction?
If I use log_loss from ski-learn, all I get is single number let say 0.8 I don't know how I can put this in the submission file.
What you submit are the probabilistic results from model prediction.
Now I understand thank you.