Identify negative samples (images without any detectable objects).
2. ### **Create a Mapping for Classes**
Since NEG has no bounding boxes, exclude it from the detection classes.
3. ### **Convert Annotations to YOLO Format**
4. ### **Stratified K-Fold Splitting**
To ensure that each fold has a similar distribution of classes, we'll implement Stratified K-Folds based on the presence of parasites.
##### **Prepare Labels for Stratification**
Create a binary label indicating whether an image contains any parasites.
5. ### **Create Train and Validation Sets for Fold 0 only**
For each fold, create separate training and validation datasets.
6. ### For Inference #####
I implemented Binary classification model (EfficientNet-B4) to classify those image classses provided to either negative (i.e 'NEG' or say background image) or classify it to positive images (i.e other two classes). All these in order to ascertain images with background or not. Then combine this claasification model weight path, with wbf ensemble method (where all my Yolo model fold weight paths were used) and use this strategy to generate my final sub file.
I repeated this strategy with at least 5 different types of Yolo models but only with fold0 each respectively.
Note: fold0 happen to be my best fold, so I decided to train each model on fold0 only. All attained at least 0.90xx above.
I experimented with Yolov8x, Yolo11m, Yolo11x, Yolov5x, Yolov5l6u. As regards wbf ensemble method, yes of course, it actually improved my overall score a lot from 0.90 to 0.92 on LB. Had it been we were able to set up the hyperparameters tuning for wbf and Yolo parameters very correctly. Then my team should have attained 2nd or 3rd place at least. But still myself and my teammate Ronny, really learned a lot from this great project, i must say. The different experiments conducted on this project was an eye opener, i must say. Also, i don't have time to implement my co-dino and co-detr models as well and these are very powerful models to tackle task like this as well.
Yeah, definitely the image and batch sizes have an impact on model performance especially due to tiny sizes of the objects involved. So, after different experiments with different image sizes and batch sizes. Image size 768 against 8 batch size worked well for my modeling set up.
I wonder how you can meet the training time of 20 hours and inference time of 2 hours with so many models like that. You even use Yolov8x, Yolo11x, Yolov5x :). My solution simply uses yolo11s with 5 folds and neg classification model 1 fold. As you said, I could easily achieve a much higher score, even 0.94x, by ensembling multiple large models like yolo11x, yolov5x6... However, that would violate the time limit.
With each of these models listed, it will only take maximum of 3hours plus or minimum of 2hours plus to train one fold, so 3 hours multiplied by 5 models will be a bit above 15 hours total runtime. So even one or two more models can still be included since each model takes maximum of 3hours plus or minimum of 2hours plus to train one fold. Therefore, all these definitely met the training runtime requirement of 20 hours. And again my inference only takes maximum of 20mimutes to run especially with wbf method.
100 epochs, no patience. My key is data processing, I don't focus too much on model parameters. I won't share much about my solution because teams are still in the phase of resubmitting their solutions. I will share it later
Well, I tried to use Ultralytics Yolov8... Yolov11, but the results were terrible. I labeled the NEG class as 'background image.' After about 30 epochs, I got a 97 mAP for the WBC class, but only 65 mAP for the Trophozoite class and an overall mAP of 81. One thing I noticed is that the bounding boxes for Trophozoite are much smaller than those for WBC, so I realized this is a small object detection problem. Although Trophozoite is the majority class, due to the tiny size of its bounding boxes, the model ironically struggled to correctly classify the majority class and performed better at classifying the minority class (WBC).
I implemented Yolov3 following the official paper http://arxiv.org/pdf/1804.02767. By doing this, I could change the activation function, anchor boxes, and grid scales directly. But I finished the implementation on Friday, caught a cold(really bad) on Saturday, and gave up on the competition on Sunday, so it was very disappointing to implement the model and not be able to compete. I implemented Yolov3 to have more flexibility in configuring the model since I was not able to solve the problem of detecting small boxes using the hyperparameters described on the official Ultralytics website.
I would like to know how you addressed this problem. How did you solve it?
Also i would like to know about it as well .
+1
1. ### **Handle "NEG" Class**
Identify negative samples (images without any detectable objects).
2. ### **Create a Mapping for Classes**
Since NEG has no bounding boxes, exclude it from the detection classes.
3. ### **Convert Annotations to YOLO Format**
4. ### **Stratified K-Fold Splitting**
To ensure that each fold has a similar distribution of classes, we'll implement Stratified K-Folds based on the presence of parasites.
##### **Prepare Labels for Stratification**
Create a binary label indicating whether an image contains any parasites.
5. ### **Create Train and Validation Sets for Fold 0 only**
For each fold, create separate training and validation datasets.
6. ### For Inference #####
I implemented Binary classification model (EfficientNet-B4) to classify those image classses provided to either negative (i.e 'NEG' or say background image) or classify it to positive images (i.e other two classes). All these in order to ascertain images with background or not. Then combine this claasification model weight path, with wbf ensemble method (where all my Yolo model fold weight paths were used) and use this strategy to generate my final sub file.
I repeated this strategy with at least 5 different types of Yolo models but only with fold0 each respectively.
Note: fold0 happen to be my best fold, so I decided to train each model on fold0 only. All attained at least 0.90xx above.
I experimented with Yolov8x, Yolo11m, Yolo11x, Yolov5x, Yolov5l6u. As regards wbf ensemble method, yes of course, it actually improved my overall score a lot from 0.90 to 0.92 on LB. Had it been we were able to set up the hyperparameters tuning for wbf and Yolo parameters very correctly. Then my team should have attained 2nd or 3rd place at least. But still myself and my teammate Ronny, really learned a lot from this great project, i must say. The different experiments conducted on this project was an eye opener, i must say. Also, i don't have time to implement my co-dino and co-detr models as well and these are very powerful models to tackle task like this as well.
Cheers !!!
Thanks for the detailed write-up. I never got to do ensembling and I am glad that wbf worked for you.
Thanks so much for these great insights, @MICADEE.
Talking about the YOLO models, what image size and batch size did you use, and did this by any means have an impact on your score?
How Many fold you have used?
According to the write up, he trained only on "fold one", since this gave the best result.
Yeah, definitely the image and batch sizes have an impact on model performance especially due to tiny sizes of the objects involved. So, after different experiments with different image sizes and batch sizes. Image size 768 against 8 batch size worked well for my modeling set up.
I trained only fold0 for each of the selected YOLO models.
You're welcome. Yeah wbf actually worked with proper implementation.
thanks for replying,But Actually i am asking how many folds you divided the dataset into , like 5 folds or 2 folds for examples
Oh... Yes, 5 folds...
Thank you for your reply
I wonder how you can meet the training time of 20 hours and inference time of 2 hours with so many models like that. You even use Yolov8x, Yolo11x, Yolov5x :). My solution simply uses yolo11s with 5 folds and neg classification model 1 fold. As you said, I could easily achieve a much higher score, even 0.94x, by ensembling multiple large models like yolo11x, yolov5x6... However, that would violate the time limit.
With each of these models listed, it will only take maximum of 3hours plus or minimum of 2hours plus to train one fold, so 3 hours multiplied by 5 models will be a bit above 15 hours total runtime. So even one or two more models can still be included since each model takes maximum of 3hours plus or minimum of 2hours plus to train one fold. Therefore, all these definitely met the training runtime requirement of 20 hours. And again my inference only takes maximum of 20mimutes to run especially with wbf method.
Uwc.
@MICADEE Could you please tell me what is Uwc method is?
Oh... It means 'You're welcome".
"Uwc" is just an acronym for 'You're welcome" okay.
Sorry i mean wbf method.
It is a method to ensemble multiple models for the detection task. https://github.com/ZFTurbo/Weighted-Boxes-Fusion
thanks . looking into it .
@BiBanhBao Thanks so much for explaining your approach.
I will like to know, for how many epochs did you train the YOL011s model and what was your patience hyperparameter.
Thank you :)
100 epochs, no patience. My key is data processing, I don't focus too much on model parameters. I won't share much about my solution because teams are still in the phase of resubmitting their solutions. I will share it later
Wow..sounds good. Thanks!!!
hello there, did the challenge conclude yet?
Congrats for the win, any updates on opensourcing the solution?
Congrats for the win, any updates on opensourcing the solution?
We will do a full writeup later on, including the open sourcing of our code :)
OK thanks.
We are impatiently waiting your solutions.
hello there, any updates on opensourcing the solution?
We have not heard from Zindi yet, I think the deadline for code review is the 11th.
Congrats for the win, any updates on opensourcing the solution?
Well, I tried to use Ultralytics Yolov8... Yolov11, but the results were terrible. I labeled the NEG class as 'background image.' After about 30 epochs, I got a 97 mAP for the WBC class, but only 65 mAP for the Trophozoite class and an overall mAP of 81. One thing I noticed is that the bounding boxes for Trophozoite are much smaller than those for WBC, so I realized this is a small object detection problem. Although Trophozoite is the majority class, due to the tiny size of its bounding boxes, the model ironically struggled to correctly classify the majority class and performed better at classifying the minority class (WBC).
Below is part of the code.
config_yaml = {'train':'/kaggle/input/lacuna-malaria-dataset/dataset_augmented_v2-yolo/images/train','names': {0:'WBC',with open('data.yaml', 'w') as file:model = YOLO('yolov8n.pt') hyperparameters = {I implemented Yolov3 following the official paper http://arxiv.org/pdf/1804.02767. By doing this, I could change the activation function, anchor boxes, and grid scales directly. But I finished the implementation on Friday, caught a cold(really bad) on Saturday, and gave up on the competition on Sunday, so it was very disappointing to implement the model and not be able to compete. I implemented Yolov3 to have more flexibility in configuring the model since I was not able to solve the problem of detecting small boxes using the hyperparameters described on the official Ultralytics website.
I would like to know how you addressed this problem. How did you solve it?
@robson_dsp Sorry to hear that you caught a cold, and thanks for sharing your approach.
I hope you are feeling better now :)