First of all, I would like to thank Zindi, Wadhwani AI for this great competition.
My solution is quite simple, just need a good understanding of the object detection framework.
I use DINO from the detrex repository.
I trained 2 models, dino_swin_base_384_4scale_12ep and dino_swin_base_384_4scale_36ep with all data, including the negative samples, to make the model well generalized.
The visualization code was then tweaked to create the submission file, ensemble by WBF.
Since I only train the model with 384x384 resolution but the score is quite consistent across public and private datasets, I think improving the resolution can achieve better performance. Unfortunately, due to the limitations of the hardware, I cannot perform it.
Thanks for reading.
Congrats your 4th place😃. Looks like with more hardware you can beat me. With 384x384, the result is already so great.
Hi, thanks for this description. I am also trying to use DINO on this same dataset. What I have observed is even just batch size of 1 needs around 19-20 GB of gpu memory. That means if I want to train with a batch size of 4 I will need 4 such gpus.
First of all is this observation correct? And what is the batch size that you used here?
@tinyswish it would be really helpful if you could answer these questions.
correct, i use 4 RTX 3090 with batch size of 4 or 6.