Primary competition visual

Ghana Crop Disease Detection Challenge

Helping Ghana
$8 000 USD
Completed (over 1 year ago)
Computer Vision
Object Detection
2205 joined
344 active
Starti
Oct 04, 24
Closei
Dec 15, 24
Reveali
Dec 15, 24
User avatar
Koleshjr
Multimedia university of kenya
Top Solutions
Platform · 16 Dec 2024, 04:40 · 20

Would really love to hear how top teams got those impressive resutlts with the time limit and compute restrictions

Discussion 20 answers
User avatar
nymfree

+1. for us, the 9 hour training limit allowed us to train and ensemble two folds. one based on a 20 fold split (cv=0.491) and the other on a 24 split (cv=0.52).

The ensemble was based on WBF and we trained RT-DETR models for around 40 epochs.

16 Dec 2024, 07:01
Upvotes 3
User avatar
Koleshjr
Multimedia university of kenya

thanks @nymfree , that's impressive. I personally avoided ensembles but I'm glad to hear that it could fit in the 9hr limit. Also what iou did you use for the wbf ensemble and the skip thresh as well.

also what image size did you use and batch size

thank you

User avatar
nymfree

we used 640 image size and 0.65 iou threshold. kept default values for skip threshold.

User avatar
Koleshjr
Multimedia university of kenya

greatt thanks

User avatar
nymfree

what did you use? model and image resolution

User avatar
Koleshjr
Multimedia university of kenya

a single yolo11x , image size 800, trained for 52 epochs

User avatar
CodeJoe

All along I was training on 100 epochs😭😭. I used yolo11s through out with 1024px imgsz on a fold split of 10 (cv=49.3).

User avatar
CodeJoe

@Koleshjr @nymfree Is there a trick to boost your model score on the test set after training?

User avatar
nymfree

computational cost scales quadratically with increase in image resolution. Very likely that only one model can successfully be trained within 9 hours at 1024 resolution

User avatar
CodeJoe

Yes it took like 7 hours to train. I thought it gave a better result so why not😌

User avatar
nymfree

other than TTA that is built in ultralytics, there is sahi. here, you basically infer on patches of high resolution images. e.g., if the image has 2048x2048 resolution, you don't resize it to 1024, but infer on 4 1024x1024 patches. that might preserve high resolution details. I tried it earlier on when I had weak models and it didn't seem to make things better. In hindsight, I should have revisited it.

User avatar
Koleshjr
Multimedia university of kenya

actually you could train two models at 1024px given that he was using using the smaller version of yolo11, but then you would have to sacrifice the number of epochs you train for the two models

User avatar
CodeJoe

I also tried sahi, In this competition, it wasn't helpful at all.

User avatar
Koleshjr
Multimedia university of kenya

First of all everyone who could not hack the 40 score is because they were using the default yolo thresh , 0.25. I really struggled with this in the beginning of the comp. So using a really small theshold helped. Then playing with the iou in the yolo predict helped. For example 0.5 worked well for me other than using the default 0.7 and also applying tta.

User avatar
CodeJoe

I did all these things. My iou was 0.559, confidence was 0.001. But I still couldn't hack the 50 score

User avatar
CodeJoe

Okay I felt it wouldn't give me that high score I needed. Actually my model reached 50.1 map50 score yet still took the map50 49.3 because map50-95 at the 49.3 map50 score was around 0.243 and higher than that of the map50 of 0.501.

I tried sahi as well. It didn't help in any way.

User avatar
stefan027

I was quite surprised to see I got to 3rd on the private LB from 16th (I think) on the public LB. I never tried submitting an ensemble because I just couldn't get two good models trained on a T4 in 9 hours. I used the MMDetection library to fine-tune a single DINO model. To get it to train for a reasonable number of epochs on a T4, I started with a pretrained model with a Resnet50 backbone, but I replaced the backbone with ConvNext-tiny which made a big difference. I also trained on square 800x800 images, fp16, batch size 4 with 2 gradient accumulation steps. After doing CV tests, I trained on all training data for 13 epochs. End to end takes about 7 to 7.5 hours on a single T4. I'll do a more detailed write-up in the next few days.

16 Dec 2024, 14:27
Upvotes 5
User avatar
Koleshjr
Multimedia university of kenya

that was a huge jump @stefan027

congrats 👏. I tried codetr at the beginning of the comp but it was too slow so I just gave up on it in the middle and focused on Yolo

User avatar
GIrum
Adama Science and Technology University

we would love to see it.