🌾 Trending Now: Cross entropy implementation f...

Radiant Earth Spot the Crop Challenge

Helping South Africa

$8 800USD

Completed (over 4 years ago)

Skills you will learn

Classification

Earth Observation

560 joined

101 active

Info Data Chat Leaderboard

Start

Jul 05, 21

Oct 03, 21

Reveal

Oct 03, 21

sebastianegli

Philipps-university marburg

Cross entropy implementation for scoring

Data · 1 Aug 2021, 12:44 · 6

Dear Zindi-Team,

I am having trouble to calculate the cross entropy metric on my internal test data (30% subset of training data provided) using the implementation by scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html#sklearn-metrics-log-loss

Or put in other words: I get completely different results when calculating the cross entropy using the log_loss function of scikit-learn than the one that is calculated for my submission. I know, there must be some difference because the data of the real test data set is different from my internal test data set. However, I would expect at least more similar results. Also, re-implementing it as stated on your website gives me very different results.

Can you please provide the implementation that you use to calculate the cross entropy metric?

Thank you and best regards, Sebastian

Discussion 6 answers

sebastianegli

Philipps-university marburg

Dear amyflorida626,

I use the sklearn.metrics.log_loss method like this:

sklearn.metrics.log_loss(y_true,y_pred)

where y_true is a 1d-array of the true labels (crop types as numbers from 1 to 9) for each field and y_pred is a 2d-array of the probabilities given to each class in each field.

In the documentation (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html) I find nothing about how to select "multi-class". Can you give me a hint on how to call the method correctly?

6 Aug 2021, 15:28

Upvotes 0

sebastianegli

Philipps-university marburg

also implementing the cross entropy score as stated on the website gives me different results from what I get from the submission :(

However, some parts of the equation given are not clear to me:

(1) How do you handle the event when p_j,i is zero which means that ln(p_j,i) is undefined?

(2) What is "J"? Cross entropy value of one field? And how is the complete score calculated? (Averaging over all Js? Summing over all Js?)

Can you please provide the implementation of the submission system so that I can check my results with that? Thanks a lot!

9 Aug 2021, 21:29

Upvotes 0

sebastianegli

Philipps-university marburg

Ok, I found the problem. The crop order as given in the sample submission file is wrong (or at least the submission system looks for another order). When you order the submission file columns according to this post, you will get the "real" cross entropy score: https://zindi.africa/hackathons/radiant-earth-spot-the-crop-hackathon/discussions/6728

@Zindi: This should be corrected or more clearly pointed out in a suitable place!

10 Aug 2021, 08:52

Upvotes 0

Lone_Wolf

University of ghana

Hi @sebastianegli .. After the modifications mentioned within https://zindi.africa/hackathons/radiant-earth-spot-the-crop-hackathon/discussions/6728 , are you seeing any correlations between cv scores and public lb?

replied to sebastianegli10 Aug 2021, 14:12

Upvotes 0

sebastianegli

Philipps-university marburg

I'm sorry, what do you mean by "cv scores and public lb" exactly?

What I can say: The cross entropy scores I get when testing internally using the sklearn.metrics.log_loss method now closely match those that I get from the submission system.

replied to Lone_Wolf10 Aug 2021, 14:27

Upvotes 0

Lone_Wolf

University of ghana

sure that helps.. thanks

replied to sebastianegli10 Aug 2021, 17:33

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status