Primary competition visual

Arm UNICEF Disaster Vulnerability Challenge

Helping Malawi
$10 000 USD
Challenge completed over 1 year ago
1189 joined
347 active
Starti
Mar 15, 24
Closei
Jun 23, 24
Reveali
Jun 23, 24
User avatar
ivan_panshin
First place solution
Connect · 24 Jun 2024, 17:12 · 4

First of all, I would like to thank the organizers of this fantastic competition. This is my first time on Zindi. Hopefully it won't be the last!

Main idea

We combine Object Detection and Regression. In particular, my fantastic teammate trained two mmdetection models (Co-Dino with Swin-Large and Co-Deformable DETR with Swin-Base) to predict bboxes for the specified categories.

I decided to take a different route and tackled the task directly by training a regressor to predict number of houses right away without any Object Detection or Segmentation. In terms of models, I chose maxvit_base_tf_512.in21k_ft_in1k from `timm`.

Preprocessing

In terms of validation, I create StratifiedGroup folds based on number of bboxes on images, and image_ids. My teammate - StratifiedGroup folds based on empty/non-empty images and image_ids.

Additionally, OD models were trained only on non-empty images. Regression was trained on all images.

Training

OD models were trained for 16 and 36 epochs with global batch size 4, and regression models - for 600 epochs (to simplify the validation, we actually trained 60 epochs, but each epoch concated the dataset with itself 10 times) with global batch size 64.

For OD, multi-scale training was utilized on resolutions from (480, 1536) up to (1536, 1536). For regression, static (512, 512) was used.

In terms of augs, several standart ones were used: Flips, Rotate, BrightnessContrast, and one custom. In particular, for regression model we created CoarseDropout that supports bboxes. In other words, by masking some parts of the image, we keep track of the masked bboxes, and update the number of houses accordingly.

Tricks

- EMA for training stability.

- bfloat16 for numerical stability

- post-processing: if model predicts at least 50 tin houses, increase the prediction by 5%. This trick worked on a single fold CV, OOF, and public. However, the boost was marginal and wasn't needed to secure the 1st place.

- Use OD models for thatch prediction, but use both OD and regression for tin and other.

- Set OD confidence threshold to 0.45 since it worked well on validation, and we didn't want to fine-tune too much to overfit.

Hardware

The OD models were trained with 4 x RTX 3090 and it takes roughly 16 hours to train both models.

The regression model was trained with 4 x A6000 Ada and it takes roughly 7 to train.

Discussion 4 answers
User avatar
nymfree

Thanks for sharing. Amazing solution. I tried and failed to train Co-Dino (too long to train on kaggle) and Co-detr (couldn't make it work). what was the CV score of the regression model alone and the mmdet ones?

24 Jun 2024, 17:49
Upvotes 0
User avatar
Jaw22
Zindi africa

Congratulations, Ivan!!

24 Jun 2024, 18:32
Upvotes 0

Congratulations !!

24 Jun 2024, 18:57
Upvotes 0

Congratulations! Regression, of course! Thanks for sharing your approach.

24 Jun 2024, 22:42
Upvotes 0