Hey,
A few people already posted about the SampleSubmission.csv file having additional field_ids that are not in the test data but required in the submission.
I also noticed that the column order of the crops in the SampleSubmission.csv file is different from the order in the Tutorial notebook, which likely explains why scores are so bad for everybody. I adjusted the relevant part in the notebook code (see below) to reorder the columns as they are in the SampleSubmission.csv file. It did give me a way better score which is now close to the score I was expecting based on my testing. I'm hoping this can help everyone until the issues get fixed by the organisers.
Code changes:
pred_df = pred_df.rename(columns={
7:'Crop_ID_1',
2:'Crop_ID_2',
0:'Crop_ID_3',
1:'Crop_ID_4',
8:'Crop_ID_5',
5:'Crop_ID_6',
4:'Crop_ID_7',
6:'Crop_ID_8',
3:'Crop_ID_9'
})
Thanks for this observation, greatly appreciated.
Thanks a lot @Just4Fun. This problem took me 3-4 days (looking for errors in my code...). When I adjust the order of the columns according to your post, I get the "real" cross entropy score.
@Zindi: This issue should really be fixed, otherwise competitors that did not take part in the Hackathon will not get the right scoring results!
Wow! @Just4Fun you are my hero!!! Thank you very much! I was looking for bug in my code for 3 days!