🔎 Approach Summary
-
Framework: MMDetection
-
Backbone: Swin Transformer (Large) with Co-DETR head
-
Training Time Constraint: 9 hours
-
Key Insight: A smaller, high-signal subset of the data + a large backbone outperformed training on the full dataset with a small model
📎 Inference Notebook: Kaggle Link – feel free to upvote if you find it useful! 🔼
📊 Smart Data Subsampling
To stay within time limits while using Swin-L, I applied a signal-based filtering strategy:
-
Created a fixed validation set
-
Established baseline model trained on small seed
-
Iteratively added training samples in batches
-
Retained only those batches that improved validation score
This led to a compact and high-signal training set.
🚀 Model Pipeline
-
Backbone: Swin-Large pretrained on ImageNet-22k
-
Detector: Co-DETR with 5-scale deformable attention
-
Training Augmentations:
Color jitter
Horizontal, vertical, diagonal flips
Random resize to 512/640 px – Multi-scale training
-
Training Epochs: 10
-
Optimizer: AdamW (lr=1e-4, weight decay=1e-4)
💡 Postprocessing & Ensembling
-
Checkpoints: Used last 3 epochs (8, 9, 10) for ensembling
-
TTA: Horizontal flip, Vertical and Transpose
-
Soft-NMS: Class-wise with IoU threshold 0.5
-
Ensembling: Bayesian-style box fusion across TTA + checkpoints
🌐 Environment
- GPU: Tesla T4 (Kaggle)
- PyTorch: 2.5.1
- MMDetection: 3.3.0
📅 Reproducibility
- All code, config, and checkpoints are available in the shared Kaggle notebook
- Run all to get the submission file
Massive thanks to the Zindi team, Makerere AI Lab, and Amini.ai for this powerful real-world challenge! 🚀
Thank you so much @Brainiac for sharing 🤝 and congratulations for your first place !!!
Really appreciate it! Trust it proves helpful
Wow, this is super cool man! Congrats on your first place win!
Cool to know you successfully applied bayesian-style box fusion. We also experimented with that a lot but could'nt squeeze much from it. Very creative data subsampling. How did you find soft nms over wbf ensemble?
Thanks a lot! Soft-NMS actually outperformed WBF in my ensemble, both on local validation and on the leaderboard.
Wow, same observation I made from our experiments. Thanks for clarifying.
Now it is settled mmdetection beats yolo
1
Thank you so much for sharing @Brainiac
Appreciate it!