Hello guys,
It's my first time posting here, just signed up very recently, knew Zindi from DLIndaba 2019 at Kenya.
The challenge actually caught my attention, I think I won't make it to the end since time is almost up, but I would like to share my approach.
The data processing and preparation in the introductory notebook seems fine to me at first, but the problem itself by viewing the images is very challenging, the objects of interest are very small most of the time, with very fine details between them.
Most YOLO models might not be optimal for this in my opinion, so I modified YOLOv11 to enable processing high resolutions, as well as taking decisions based on earlier activations (P2 for example), also Conv can be replaced with LightConv or GhostConv.
In addition, the augmentations that are enabled by Ultralytics can also affect the model, for example blurring, blur is cool, but when dealing with such fine details it can be a bit misleading, so I disabled it completely, and added some rotations and affine transformations instead.
I will be sharing the network for your reference and maybe discuss it here, looking forward to hearing from you all guys.
# Backbone
backbone:
- [ -1, 1, Focus, [ 64, 3 ] ] #0: Focus, stride=2 -> P1/2
- [ -1, 1, Conv, [ 128, 3, 2 ] ] #1: Conv stride=2 -> P2/4
- [ -1, 2, C3k2, [ 256, False, 0.25 ] ] #2: C3k2 x2 -> refine P2
- [ -1, 1, Conv, [ 256, 3, 2 ] ] #3: Conv stride=2 -> P3/8
- [ -1, 2, C3k2, [ 512, False, 0.25 ] ] #4: C3k2 x2 -> refine P3
- [ -1, 1, Conv, [ 512, 3, 2 ] ] #5: Conv stride=2 -> P4/16
- [ -1, 2, C3k2, [ 512, True ] ] #6: C3k2 x2 -> refine P4
- [ -1, 1, Conv, [ 1024, 3, 2 ] ] #7: Conv stride=2 -> P5/32
- [ -1, 2, C3k2, [ 1024, True ] ] #8: C3k2 x2 -> refine P5
- [ -1, 1, SPPF, [ 1024, 5 ] ] #9: SPPF for global context P5
- [ -1, 2, C2PSA, [ 1024 ] ] #10: C2PSA x2 -> final P5 features
# Neck (FPN top-down path)
head:
# Top-down FPN
# P5 → P4
- [ 10, 1, nn.Upsample, [ None, 2, "nearest" ] ] #11: Upsample P5 to P4 scale
- [ [ -1, 6 ], 1, Concat, [ 1 ] ] #12: Concat P5-up with P4
- [ -1, 2, C3k2, [ 512, False ] ] #13: Refine -> P4_fpn
# P4_fpn → P3
- [ 13, 1, nn.Upsample, [ None, 2, "nearest" ] ] #14: Upsample P4_fpn to P3
- [ [ -1, 4 ], 1, Concat, [ 1 ] ] #15: Concat with P3
- [ -1, 2, C3k2, [ 256, False ] ] #16: Refine -> P3_fpn
# P3_fpn → P2
- [ 16, 1, nn.Upsample, [ None, 2, "nearest" ] ] #17: Upsample P3_fpn to P2
- [ [ -1, 2 ], 1, Concat, [ 1 ] ] #18: Concat with P2
- [ -1, 2, C3k2, [ 256, False ] ] #19: Refine -> P2_fpn
# Bottom-up PAN
# P2_fpn → P3_fpn
- [ 19, 1, Conv, [ 256, 3, 2 ] ] #20: Downsample P2_fpn to P3 scale
- [ [ -1, 16 ], 1, Concat, [ 1 ] ] #21: Concat with P3_fpn
- [ -1, 2, C3k2, [ 256, False ] ] #22: Refine -> P3_pan
# P3_pan → P4_fpn
- [ 22, 1, Conv, [ 512, 3, 2 ] ] #23: Downsample P3_pan to P4
- [ [ -1, 13 ], 1, Concat, [ 1 ] ] #24: Concat with P4_fpn
- [ -1, 2, C3k2, [ 512, False ] ] #25: Refine -> P4_pan
# P4_pan → P5
- [ 25, 1, Conv, [ 1024, 3, 2 ] ] #26: Downsample P4_pan to P5
- [ [ -1, 10 ], 1, Concat, [ 1 ] ] #27: Concat with original P5 features
- [ -1, 2, C3k2, [ 1024, True ] ] #28: Refine -> P5_pan
# Detection heads at P2_fpn, P3_pan, P4_pan, P5_pan
- [ [ 19, 22, 25, 28 ], 1, Detect, [ nc ] ] #29: Detect from four scales
Also mosaic augmentation is not the best choice here, so probability 1.0 is too high
Interesting. Why, in your opinion, is mosaic not suitable here?
In my opinion, details and features in this case are extremely fine and small, and to maintain a reasonable model input size you can't go too far here, therefore, a mosaic augmentation while maintaining reasonable image size and preserving very fine features won't be easy.
So I tend to decrease the probability to 0.1 maybe, didn't try much, just joined today and got 0.16 unfortunately
you intuition about setting mosaic probability to a low number is correct. just tested it
Did you run this and ever submit? Score?
Quick experiment without much playing around, got 0.16, at first it was 0.09 but after playing with the thresholds it reached 0.16 for the same weights