Hey Everyone,
I would like to know which vision models other contestants have been using and have you been finding any success with them?
I have used Yolo and Faster RCNN. Research suggests Faster RCNN should do better in this problem, yet YOLO seems to be the only model that produces okay test scores for me.
Has anyone scored high using a model other than YOLO?
Still, sticking with YOLO remains the best option considering the 9-hour maximum runtime stipulated for this task.
**1. YOLO (You Only Look Once)**
> Faster — A single-stage detector that predicts bounding boxes and classes in one pass.
> Optimized for real-time detection (typically 30–60 FPS on a good GPU).
> Designed for edge devices (especially YOLOv5 and YOLOv8 with nano or small models).
**2. Faster R-CNN**
❌ Slower — A two-stage detector:
- First proposes regions using a Region Proposal Network (RPN)
- Then classifies and refines bounding boxes
❌ Typically achieves only 5–10 FPS, even on high-end GPUs.
Yeah the train time is much slower for RCNN but I found this paper called "Detecting soybean leaf disease from synthetic image using multi-feature fusion faster R-CNN" where they acheieve decent accuracy with the model. They trained for 10 epochs and got some decent results. When I train for 10 epochs on google colab it takes about 20min per epoch on a T4 instance. The torch implementation of RCNN is also able to export to ONNX which can help with mobile deployment.
For me its YOLO (tried different versions like 11 12 and 9 and different sizes) its working pretty well but am already finding a problem with fitting it into the 9h training frame so I dont know if RCNN will work here.