My solution combines multi-modal data fusion and robust training techniques to tackle the solar panel counting challenge. Key components include:
-
Model ArchitectureBackbone : EfficientNetV2 variant for image feature extraction
Metadata Integration : Encoded image origin (D/G) and placement type (roof/ground) via one-hot/dense embeddings
Fusion : Concatenated visual features + metadata processed through a 2-layer regression head
-
Data StrategyCross-Validation : Stratified K-Fold to handle class imbalance
Augmentation Pipeline :Dynamic spatial transforms (geometric + color)
Targeted dropout patterns to reduce overfitting
-
Training ProtocolLoss : MAE-focused objective with gradient scaling
Optimization : AdamW with cosine LR scheduling
Infrastructure : Mixed-precision training for efficiency
-
Inference Enhancements : Test-time augmentation (TTA) with consistent preprocessing
Prediction aggregation from multiple model checkpoints
Validation Insights
- Achieved steady MAE improvement across epochs (1.25-2.35 range)
- Metadata integration provided ~8% performance boost vs image-only baseline
This approach balances model capacity, data diversity, and regularization to handle the dataset's unique challenges. Would love to hear about others' strategies for metadata utilization and augmentation design!
Thank you @zulu40
Interesting all along I was just training images. Thank you @zulo40. Much appreciated
My pleasure
what's your local Mae for all folds ?
My Average Val MAE was somewhat near 1.25155
nice , thank you for sharing
For future i think i will experiment with Vision Transformers
The competition has a file-size limit (many Kaggle competitions cap submissions around 20–30 MB). Even if your file has the Slice Master same number of rows, differences in formatting or precision can make it much bigger.