After the baseline, Every improvement I got was by analysing failing instances(Absolute difference >5) on the validation set. I also analysed failing cases on my best model and observed that most of the errors were coming from instances where:
1) Bollworms have similar colour dust around them, so the model was classifying them as bollworm
Examples: "id_d8200a46b8f136a269783aca", "id_1512eccc96bba912d8df58e1", "id_325de665069d4bc5fc94a467"
2) Bollworms are cluttered, so the model was not able to detect all bollworms (maybe because of NMS)
Examples: "id_824cb8757b7c655ae530a8d1", "id_498d2243a9dcb99c39a883e0", "id_287e7c442fd0e2e17e705726."
For the past week, I tried different approaches like augmentation(gaussian noise, coarse dropout, pixel dropout etc.), weighted sampling failing cases, training larger models and a few more. But nothing improved results by big difference.
My approach primarily relied on yolov5(some postprocessing, I will share the details later).
If anyone finds any workaround for this, please share. Second place solution also came from yolov5; love to know how this issue was dealt with.