🌾 Hot Topic: Top Solutions

Multimedia university of kenya

Top Solutions

Platform · 16 Dec 2024, 04:40 · 20

Would really love to hear how top teams got those impressive resutlts with the time limit and compute restrictions

Discussion 20 answers

+1. for us, the 9 hour training limit allowed us to train and ensemble two folds. one based on a 20 fold split (cv=0.491) and the other on a 24 split (cv=0.52).

The ensemble was based on WBF and we trained RT-DETR models for around 40 epochs.

16 Dec 2024, 07:01

Upvotes 3

replied to nymfree16 Dec 2024, 07:09

Multimedia university of kenya

thanks @nymfree , that's impressive. I personally avoided ensembles but I'm glad to hear that it could fit in the 9hr limit. Also what iou did you use for the wbf ensemble and the skip thresh as well.

also what image size did you use and batch size

thank you

Upvotes 0

replied to Koleshjr16 Dec 2024, 07:14

we used 640 image size and 0.65 iou threshold. kept default values for skip threshold.

Upvotes 0

replied to nymfree16 Dec 2024, 07:22

Multimedia university of kenya

greatt thanks

Upvotes 0

replied to Koleshjr16 Dec 2024, 07:26

what did you use? model and image resolution

Upvotes 0

replied to nymfree16 Dec 2024, 07:47

Multimedia university of kenya

a single yolo11x , image size 800, trained for 52 epochs

Upvotes 1

replied to nymfree16 Dec 2024, 08:53

All along I was training on 100 epochs😭😭. I used yolo11s through out with 1024px imgsz on a fold split of 10 (cv=49.3).

Upvotes 0

replied to Koleshjr16 Dec 2024, 08:54

@Koleshjr @nymfree Is there a trick to boost your model score on the test set after training?

Upvotes 0

replied to CodeJoe16 Dec 2024, 08:57

computational cost scales quadratically with increase in image resolution. Very likely that only one model can successfully be trained within 9 hours at 1024 resolution

Upvotes 1

replied to nymfree16 Dec 2024, 09:00

Yes it took like 7 hours to train. I thought it gave a better result so why not😌

Upvotes 0

replied to CodeJoe16 Dec 2024, 09:03

other than TTA that is built in ultralytics, there is sahi. here, you basically infer on patches of high resolution images. e.g., if the image has 2048x2048 resolution, you don't resize it to 1024, but infer on 4 1024x1024 patches. that might preserve high resolution details. I tried it earlier on when I had weak models and it didn't seem to make things better. In hindsight, I should have revisited it.

Upvotes 2

replied to nymfree16 Dec 2024, 09:06

Multimedia university of kenya

actually you could train two models at 1024px given that he was using using the smaller version of yolo11, but then you would have to sacrifice the number of epochs you train for the two models

Upvotes 1

replied to nymfree16 Dec 2024, 09:07

I also tried sahi, In this competition, it wasn't helpful at all.

Upvotes 0

replied to CodeJoe16 Dec 2024, 09:08

Multimedia university of kenya

First of all everyone who could not hack the 40 score is because they were using the default yolo thresh , 0.25. I really struggled with this in the beginning of the comp. So using a really small theshold helped. Then playing with the iou in the yolo predict helped. For example 0.5 worked well for me other than using the default 0.7 and also applying tta.

Upvotes 2

replied to Koleshjr16 Dec 2024, 09:24

I did all these things. My iou was 0.559, confidence was 0.001. But I still couldn't hack the 50 score

Upvotes 0

replied to Koleshjr16 Dec 2024, 09:26

Okay I felt it wouldn't give me that high score I needed. Actually my model reached 50.1 map50 score yet still took the map50 49.3 because map50-95 at the 49.3 map50 score was around 0.243 and higher than that of the map50 of 0.501.

Upvotes 0

da_

I tried sahi as well. It didn't help in any way.

replied to nymfree16 Dec 2024, 11:01

Upvotes 0

stefan027

I was quite surprised to see I got to 3rd on the private LB from 16th (I think) on the public LB. I never tried submitting an ensemble because I just couldn't get two good models trained on a T4 in 9 hours. I used the MMDetection library to fine-tune a single DINO model. To get it to train for a reasonable number of epochs on a T4, I started with a pretrained model with a Resnet50 backbone, but I replaced the backbone with ConvNext-tiny which made a big difference. I also trained on square 800x800 images, fp16, batch size 4 with 2 gradient accumulation steps. After doing CV tests, I trained on all training data for 13 epochs. End to end takes about 7 to 7.5 hours on a single T4. I'll do a more detailed write-up in the next few days.

16 Dec 2024, 14:27

Upvotes 5