Hi guys,
I am training an ensemble solution. Detectron2 + Custom part.
At the moment, if calculated separately, score obtained from Detectron2 keeps to be around 3.5 and then the second part reduces that to 2.5.
I am looking for someone who has a model that gives something a bit below 2.5 for starters to see whether with my second addition the result can become 1.5 or better.
I know collaboration is frown upon in the last 5 days. Yet, I believe that it is primarily for people who are among potential winners to prevent team formations to secure the prize. Thus, I do not see it is a wrong-doing and offer code and explanations (code is in Python) how I constructed the second part of my solution in exchange of net and weights that produce something below 2.5 for submission without any additional solution components or postprocessing. Of course, happy to form a team if approved by organizers.
I also have masks for the training data. With or without, that did not seem to influence Detectron2 that much though, mask-rcnn is as 3.5 as other rcnns I tried.
I speak English, Russian and French.
I am using ensemble of yolo that gives around 2.3, but with a small trick and some regrssion cnn brought it down to 1.9
My output correction algorithm takes as an input output of detectron2, hence bboxes and scores. If an ensemble of yolo outputs that then I think what I have done has a potential to reduce 2.3 to around 1.3. At least that is what I am getting, but I can not make detectron2 to produce anything better than 3.0... even though I am configuring it in different ways suitable for images of larger size.
I guess I also have a similar setup, the small trick is that I combined predictions of 3 yolov7 and 1 yolov5 ( trained on 512 img size ) ( bboxes, scores, ) with CNN regression ( 2 different architecture + stratified kfold -> TTA , making 4 model for pbw and 15 for abw ) and putted the entire thing into XGBoost.
Manually checking my predictions looks faily good when pbw/abw < 100, but the error margin goes bigger when pbw > 200/150ish
Just sharing the things I did, not sure if there's any time left for collaboration, will probably opensource all the code :)
What do you mean by 'with CNN regression'? Is it to classify between abw and pbw from the cropped bboxes of bugs on original image?
Sorry for the late reply. I mistakenly said I was using CNN regression ( by CNN regression, I mean using CNN with regression loss ( mse/mae) and using no/relu activation in the output layer ),
I added the term CNN regression in the first place because I did experiment with using cross-entropy loss for only predicting the number of ABW which seemed to perform a bit better for me, so I stuck with that setup for generating the prediction. And later I also used cross-entropy loss to train the model to predict PBW.
> What do you mean by 'with CNN regression'? Is it to classify between abw and pbw from the cropped bboxes of bugs on original image?
Nope, I did do that given that rarely there were both ABW and PBW in a single image,
Looking at the 2nd and 7th solutions, my hypothesis is using a bigger image size can dramatically improve the results, I trained all the YOLOs and CNNs in 512/640 image sizes.
Thank you for clarifications. I can confirm your hypothesis about the image size. The day before the last I decided to check what happens when higher image sizes are used for training. Indeed setting 1600 instead of 800 for shortest side size (detectron suggested image sizing) I got a much better result with detection only. And it is training batch of 2 on 48g memory for not so many iterations. Surprisingly, inference took only 8g. Trying 3200 for shortest size caused ´out of memory´ on some images with batch = 1. A change to a 96g machine should help. As it is (now) obvious that resizing the image to a smaller size is not a smart choice, another option is to chop images into several pieces 4 or 16 depending on available hardware. I am keen though to see where the approach of 'do nothing at all, just train on a machine with more GPU memory´ brings me. Above 1.0 or below 1.0 - that's the question.
Yes! It does look like it. I am currently training yolov7 tiny on my RTX 2070 Super, with batch size of 4, image size 1504. On it's 10th epoch it seems to already nearly surpassing yolov7 trained on 60 epochs with 640 image size.