Microsoft Rice Disease Classification Challenge
Can you identify disease in images of rice grown in Egypt?
$3 000 USD
Ended 6 months ago
261 active · 833 enrolled
Computer Vision
Test set Ids != Sample Submission Ids
Data · 14 May 2022, 00:32 · 11

As you build your own baseline model, you need to know that the Image Ids in the test set are not the same as the sample submission. To avoid doing inference twice, you may want to load in the sample submission as your test dataset. This could also be the reason why you haven't beaten the benchmark.

@zindi is this intentional? The sample submission is a tiny set compared to the test set, and there are ids that don't intersect.

Discussion 11 answers

@Professor Yes i encountered same. And like you said, i will also like to know if this is intentional or not from @Zindi. Notwithstanding, i think in this scenerio one needs not to be bothered about SampleSubmission which has number of "id's" that's not equal to the number of "id's" present in test dataset. For me "SampleSubmission" is just to show us the "inference/submission format" we need to upload on the zindi leaderboard. So to me, i think test data and it's id's matters a lot here, all we have to do is to re-adjust this test data just like in the form of required zindi submission format for this challenge. Above all these, i will still like to hear from @Zindi on this.

14 May 2022, 05:58
Upvotes 3

There are 2 pictures for each image due to a normal camera and the MAPIR Survey3N.

You can choose to work only with the normal RGB images or only with the RG-NIR images or both, however you only need to make submissions for one.

Yes, please read in the SampleSubmission in when inferring.

Great @amyflorida626. It's more clearer now. Thanks.

Okay i understand the part where we have a choice of using either or both, but i am having trouble understanding that you can submit only for one type. That means its either normal images or RGN images but the sub file has ids only for normal images. So can we submit also for RGN images only? When i do i get an error missing entries for ids in the normal images. Kindly help

Missing entries for IDs id_3zpci62t81.jpg, id_soncg0jffg.jpg, id_zlht6gd0y2.jpg, id_1m6n769g91.jpg, id_l613kk54tv.jpg and more

That is the error i'm getting after submitting only for RGN images. I will appreciate if someone helps me to understand.

@Koleshjr. I think.... i got you message.


'cwd': './',             # Current directory

'arch': 'MODELS/',       # Folder in which we're going to save our models

'raw': '/content/',      # Folder containing the training files

'Images': '/content/' # This folder is going to store our images files


os.makedirs(PATHS['arch'], exist_ok=True)

train = pd.read_csv(PATHS['raw']+'Train.csv')

train = train[train.Image_id.str.contains('_rgn.jpg')]        # Just the RGN images only

train_images_list = train['Image_id'].tolist()                # Convert images column into list

train_images_list = [fn for fn in train_images_list if fn.endswith("_rgn.jpg")]

images_list = os.listdir(PATHS['Images'])

images_list = [fn for fn in images_list if fn.endswith("_rgn.jpg")]

test_images_list = [fn for fn in images_list if fn not in train_images_list]

test_images_list = [fn for fn in images_list if fn.endswith("_rgn.jpg")]

sub = pd.DataFrame(test_images_list, columns=['Image_id'])

sub['Label'] = 0


I appreciate it. Thanks. In short, since they are basically the same images there is no need to add the "_rgn" in the submission file. We use the submission file as it is?

Yea.... you're right. So there's no need to train only on RGN images. Don't think it's advisable to do that anyway.