Primary competition visual

Amini Cocoa Contamination Challenge

Helping Ghana
$7 000 USD
Completed (11 months ago)
Computer Vision
Object Detection
928 joined
255 active
Starti
Feb 14, 25
Closei
May 11, 25
Reveali
May 12, 25
User avatar
stefan027
2nd place solution
Notebooks · 28 May 2025, 08:07 · 5

Our solution (Team Neural Beans with @100i and me) is conceptually simple - just a single object detection model. We fine-tuned a DINO model with a Swin Transformer-base backbone using the MMDetection library.

There are two main pre-trained versions of this model in mmdet: a version with a ResNet50 backbone and 4 scales of feature maps (DINO-4scale-R-50), and a more performant version with a Swin-Large backbone and 5 scales of feature maps (DINO-5scale-Swin-L). The DINO-5scale-Swin-L model is too big and slow given the resource restrictions of this challenge. We performed experiments with different backbones (including ConvNext (Tiny and Small), Swin (Small, Base and Large), and SwinV2 (Base)), 4 and 5 features scales, and different image sizes. Our best combination uses a Swin-Base backbone, 4 features scales and square 640x640 images.

Our training pipeline includes random horizontal and vertical flips, colour variations (using mmdet's YOLOXHSVRandomAug augmentation), and different image scales. Experiments with mosaic and mixup didn't improve the model. We utilised Exponential Moving Average (EMA) of weights during training which improved validation performance.

The model was trained for 12 epochs with a learning rate of 0.0001, with linear warmup over the first epoch, and cosine annealing beginning after the 6th epoch. The model was trained with mixed precision to reduce GPU memory usage.

Resources:

Discussion 5 answers
User avatar
CodeJoe

Every time I come across that library, it's like seeing stars- very difficult to understand. The expertise behind it is undeniable. A huge congratulations to you guys.

28 May 2025, 08:30
Upvotes 2
User avatar
stefan027

Yeah that library definitely needs some maintenance because it's getting harder and harder to manage the dependencies, especially in environments like Kaggle and Colab

User avatar
CodeJoe

Very true, Anyways great work, really learnt from your solution.

User avatar
analyst

How do you manage slow inference speed when using mmdetection

28 May 2025, 11:34
Upvotes 0
User avatar
stefan027

Thanks for the question. We had no problems with inference speed, so it is not something we spent much time thinking about. A few points:

  1. mmdetection is not a model, it's just a toolbox. I don't think mmdetection is slow, but some model implementations are slow. There are many different object detection models implemented, some of which are big and slow, while others are designed for real-time inference.
  2. The DINO model that we used is not intended for real-time inference (i.e., for video), but we didn't need real-time inferencing for this challenge.
  3. I just checked the inference speed of our model using mmdet (i.e., not optimised for inference) on a Kaggle T4 instance, and it takes only approx. 500ms for inference on a single image.
  4. The competition's inference time restriction of 3 hours was extremely generous. Even with TTA, we ran batch inference on all test images in about 12 minutes on a T4 GPU.
  5. Models trained with mmdet can be optimised for inference. There is a library called mmdeploy just for that. We show how to convert our model to ONNX in this notebook. As with all the open-mmlab libraries, getting everything set up to work correctly can be tricky though.