Primary competition visual

Radiant Earth Spot the Crop Challenge

Helping South Africa
$8 800USD
Completed (over 4 years ago)
Classification
Earth Observation
559 joined
101 active
Starti
Jul 05, 21
Closei
Oct 03, 21
Reveali
Oct 03, 21
User avatar
Philipps-university marburg
Cross entropy implementation for scoring
Data · 1 Aug 2021, 12:44 · 6

Dear Zindi-Team,

I am having trouble to calculate the cross entropy metric on my internal test data (30% subset of training data provided) using the implementation by scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html#sklearn-metrics-log-loss

Or put in other words: I get completely different results when calculating the cross entropy using the log_loss function of scikit-learn than the one that is calculated for my submission. I know, there must be some difference because the data of the real test data set is different from my internal test data set. However, I would expect at least more similar results. Also, re-implementing it as stated on your website gives me very different results.

Can you please provide the implementation that you use to calculate the cross entropy metric?

Thank you and best regards, Sebastian

Discussion 6 answers
User avatar
Philipps-university marburg

Dear amyflorida626,

I use the sklearn.metrics.log_loss method like this:

sklearn.metrics.log_loss(y_true,y_pred)

where y_true is a 1d-array of the true labels (crop types as numbers from 1 to 9) for each field and y_pred is a 2d-array of the probabilities given to each class in each field.

In the documentation (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html) I find nothing about how to select "multi-class". Can you give me a hint on how to call the method correctly?

6 Aug 2021, 15:28
Upvotes 0
User avatar
Philipps-university marburg

also implementing the cross entropy score as stated on the website gives me different results from what I get from the submission :(

However, some parts of the equation given are not clear to me:

(1) How do you handle the event when p_j,i is zero which means that ln(p_j,i) is undefined?

(2) What is "J"? Cross entropy value of one field? And how is the complete score calculated? (Averaging over all Js? Summing over all Js?)

Can you please provide the implementation of the submission system so that I can check my results with that? Thanks a lot!

9 Aug 2021, 21:29
Upvotes 0
User avatar
Philipps-university marburg

Ok, I found the problem. The crop order as given in the sample submission file is wrong (or at least the submission system looks for another order). When you order the submission file columns according to this post, you will get the "real" cross entropy score: https://zindi.africa/hackathons/radiant-earth-spot-the-crop-hackathon/discussions/6728

@Zindi: This should be corrected or more clearly pointed out in a suitable place!

10 Aug 2021, 08:52
Upvotes 0
User avatar
Lone_Wolf
University of ghana

Hi @sebastianegli .. After the modifications mentioned within https://zindi.africa/hackathons/radiant-earth-spot-the-crop-hackathon/discussions/6728 , are you seeing any correlations between cv scores and public lb?

User avatar
Philipps-university marburg

I'm sorry, what do you mean by "cv scores and public lb" exactly?

What I can say: The cross entropy scores I get when testing internally using the sklearn.metrics.log_loss method now closely match those that I get from the submission system.

User avatar
Lone_Wolf
University of ghana

sure that helps.. thanks