ICLR Workshop Challenge #1: CGIAR Computer Vision for Crop Disease
$5,000 USD
Identify wheat rust in images from Ethiopia and Tanzania, and win a trip to present your work at ICLR 2020 in Addis Ababa.
846 data scientists enrolled, 305 on the leaderboard
AgricultureComputer VisionUnstructuredSDG2
29 January—29 March
The competition is over
published 29 Mar 2020, 06:57

The competition is over. The shake-up is happened (unfortunately for us, not for the better). We don't quite understand why...

Anyway, we would be happy to read the ideas of the teams that took high places (if the rules allow this).

Hi Sir-G, would be nice if you could share too as you end up among the top 4%. Actually, I have a few questions to anyone who want to share his/her approach:

- Which framework do use? pytorch/tensorflow (fastai/keras)?

- Did you do some data specific preprocessing before training? Which augmentation during training?

- Any special architecture or training schedule?

I'd be happy to hear from you and also to share with anyone who asks :)

I've found a lot of duplicates in the train set and and in the test set. Removing duplicates from the train set is good (but there are a few having different labels, I decided to drop them). It seems some of samples have wrong labels, but correcting them didn't help.

The main failure is that I decided to copy labels from the train set to test set if possible (for duplicates, ~250 samples). It seems that labels are noisy. It works on the public part of the test set, but not for the private. :( We've got a big penalty for these labels. By late submissions I guess we could achive at least ~0.20 I guess. Our current result is the old single model, without copying labels.

Thanks for sharing this. I also applied the same technique to get rid of duplicates, assigning labels to the incorrect ones.

But i found that the scores dropped.

I got a score of 0.28 using FastAi model with Densenet201 for image size 256, bs=32.

It would be great if you could share your approach so that it would be helpful for those who are learning (like me :) ) in the future

Thanks for sharing. I also used fastai and a densenet201 model. Did not eliminate duplicates. But could manage a top score of 0.41 only. Mostly standard code. It would be interesting if you could share your code.

My code link https://anindabitm.github.io/anindadslog/2020/03/25/ICLR_crop_diseases_fastai.html


> assigning labels to the incorrect ones.

To be honest, I have long misunderstood what this is "stem rust" and "leaf rust". Only 10 days before the end, we read what it is actually. :) "Stem rust" can be on leaves, and "leaf rust" can be on a stem. They differ by colour and by the shape of spots, not the location of desease.

> I got a score of 0.28 using FastAi model with Densenet201 for image size 256, bs=32.

I don't know your full pipleline, but I suggest

1) Use pseudolabeling (but do not use test images, that are duplicates from the train set), it seems it gives a good boost here.

2) Try higher resolution. We used 360-380 (and more), because sometimes desease is small enough. Or, probably, use more aggressive crops while augmenting.

3) We have found that median works better that the average for fold results (we have 5 folds).

4) Use diffirent models and then combine them (by averaging, for example).

Hello Guys,

I had to send the description to the organizers. Once they will evaluate it, I could be allowed to share the whole description of the 2nd place solution.


I have been working with pure TensorFlow, smaller architectures and higher input resolution then you. Train Augmentations help a lot. Single net - single prediction was 1st for a long period. Test-time augmentations helped a lot. Multiple nets did not helped much.

All Best,



Leaderboard is updated. Many congrats to picekl and Val_An! Looking forward to hearing the solution.

For some reason, Zindi couldn't reproduce our score. And as far as I understand for many teams so. We don't care much, but I think the VinBDI.MedicalImagingTeam is upset. ;) Anyway, it would be nice to read top-2 solutions.

So here's my architecture

1. For a bs of 32, and input size 256 and applying data augmentations, i trained on densenet201 with different learning rates

2. Unfreezed the model and further trained for some epochs using different learning rates

3. Applied TTA on the test set

4. got 0.28

i tried a lot of methods after that but my score didn't improve like: resizing , averaging models (used resnet 151, efficientnetb4) but didn't got better than 0.28

Unfortunately, i encountered memory issues when i was trying with higher resolution images greater than 256 so i was kind of restricted.

> Unfortunately, i encountered memory issues when i was trying with higher resolution images greater than 256

Did you try to decrease batch size? I used bs=6 for res=380.

I have found the two unfortunate examples that I copied:

1JPXPR (test, labelled as "leaf") is duplicate of U2V5YV (train, labelled as "stem")

YYYK9H (test, labelled as "stem") is a small copy of 9AEUF2 (train, labelled as "leaf")

I counted with a hash function (no visual inspection) that 256 test images could be removed as duplicates . 337+256 = 593. I'm sure I missed some.

Yes, probably many duplicates are off the scoring. But not all, for example for which we've got penalty:

1JPXPR (test, labelled as "leaf") is duplicate of U2V5YV (train, labelled as "stem")

By the way, I have found 253 duplicates by phash (not really duplicates, some are very similar). You've found some more. :)

Thank you very much! That might be the reason why I got very unstable results re-running the same script. I think the noisy images contributed to that error. I also found out that applying different scale factors to the augmented images lead to quite different scores.

Hi Lukas,

Thank you for sharing, I'll be looking forward to the release of the complete solution. Great work!



Thanks for this thread. It was a very nice learning experience for me. I basically overfit to the leaderboard. My solution can be found here https://github.com/krishnakalyan3/iclr-crop-disease-zindi

Hello. This was a fun little challenge. The noisy labels were frustrating. What worked best for me was fastai2 with senet154 as the network, specifically gluon_senet154 from https://github.com/rwightman/pytorch-image-models and mixup. Vanilla solution, no pseudolabelling or tta or ensembling.

The interesting thing I have found: Zindi is using only 337of 610 samples for the final evalution. I don't know why they dropped other samples.

Hi All,

is their any chance that we can get an approx number of misclassifications based on our score?

eg: i scored 0.28 from 610 test images so is it possible to get an idea about how many misclassifications my model made? I am working on some analysis and hence would need that number.

It would be great if someone could help :)