🚆 Data Talk: 1st Place Solution (private lb...

Barbados Traffic Analysis Challenge

Helping Barbados

$11 000 USD

Completed (6 months ago)

Skills you will learn

Computer Vision

Prediction

Object Tracking

Video Analytics

Deep Learning

1853 joined

216 active

Info Data Chat Leaderboard

Start

Nov 07, 25

Jan 25, 26

Reveal

Jan 26, 26

21db

1st Place Solution (private lb)

Notebooks · 1 Feb 2026, 08:18 · 9

This was a great challenge and I’m grateful to have achieved 1st place on the private leaderboard, also my first 1st-place finish on a private LB, which is awesome! Huge thanks to my teammate @wuuthraad and to the competition organizers for setting up such a challenging problem.

Our solution used LightGBM with carefully constructed temporal features to predict congestion across all required future horizons. One of the biggest challenges was the weak correlation between local CV and the public leaderboard, which meant trusting cross-validation more heavily and tracking experiments and improvements thoroughly. Because of the strong class imbalance, we also relied on a two-stage balancing strategy, combining downsampling of fully free-flowing sequences with per-fold oversampling of minority -classes during training to optimize F1-macro without leaking information into validation along with an F1 eval function for early stopping.

The training data is constructed using strict, gap-free 15-minute windows, grouped by camera, and only samples with all future targets present are retained after building out lag features. This enforces temporal integrity and fully respects the embargo and real-time inference constraints. I also experimented with sequence-based LSTM models trained on full 15-minute multivariate inputs to predict all future minutes jointly, but these did not outperform the per-horizon LightGBM models in terms of generalization.

In addition, I explored video-derived features extracted using YOLO tracking (e.g. average, max, and standard deviation of vehicle counts across frames). These features consistently improved LSTM validation performance, but did not improve LightGBM results and did not generalize to the test set. In practice, the tree-based models appeared to generalize better by focusing on cleaner temporal signals, while higher-capacity sequence models were more sensitive to noise introduced by video-level features on testing.

Please find the model training notebook and repository for our final solution at the link below:

Notebook: https://github.com/daniel-bru/Barbados-Traffic-Analysis-Solution/blob/main/modelling_v2.ipynb

Repo: https://github.com/daniel-bru/Barbados-Traffic-Analysis-Solution/tree/main

Repo & notebook notes:End-to-end experiments can be run with `modelling_v2.ipynb`. Each experiment is saved under `lgbm_training_history/`, just update the notebook, and set a new name for the experiment in the notebook config.

Curious to hear others' approaches and whether extracting features from the videos helped improve your models

Discussion 9 answers

Koleshjr

Multimedia university of kenya

Thank you Daniel for sharing your solution. I really appreciate it.

I have one question though, doesn't the create future targets violate the back propagation rule?

def create_future_target_features(df, target_cols, future_steps=[3, 4, 5, 6, 7]): """Create future target features for prediction""" df_feat = df.copy() for col in target_cols: for step in future_steps: df_feat[f'{col}_t{step}'] = df_feat.groupby('view_label')[col].shift(-step) return df_feat

The future targets won't be there in production yes?

1 Feb 2026, 11:24

Upvotes 1

21db

@Koleshjr, Thanks. That function is used to create the dataset for training, but when we actually start training to split X and y, all those targets are dropped from X. You will see that in the prepare_modeling_data() function👍

replied to Koleshjr1 Feb 2026, 11:41

Upvotes 0

Koleshjr

Multimedia university of kenya

I have actually seen. My bad. You used it for filtering

Thanks again for sharing 🤝

replied to 21db1 Feb 2026, 11:42

Upvotes 1

21db

Awesome, yes I used it to filter out what shouldn't be there and only point to the correct target for y

replied to Koleshjr1 Feb 2026, 11:55

Upvotes 0

Koleshjr

Multimedia university of kenya

Can I ask which was the main features that led to the most improvement?

replied to Koleshjr1 Feb 2026, 13:17

Upvotes 0

21db

All the time based futures were crucial and worked well with the lag features. The most improvement came from lower learning rate, and oversampling approaches.

replied to Koleshjr1 Feb 2026, 13:57

Upvotes 0

wuuthraad

overall it was a challenging but really great competition, I would like to thank @21db and the @zindi team for a great competition

@21db was the key driver behind the overall approach. We experimented with a wide range of modeling techniques, including deep learning, linear regression, and various ensembling strategies, to identify the most robust solution. as mentioned, During submission, we observed a lack of correlation between our local cross-validation results and leaderboard performance. This was particularly evident when I was training a CatBoost model, although it significantly outperformed LightGBM in local CV, its public leaderboard performance was comparable to LightGBM, and its private leaderboard score dropped substantially relative to LightGBM. We also explored the inclusion of video-based features. however, these resulted in little to no improvement in overall performance. Based on this, @21db made the decision to focus primarily on LightGBM and invest effort into hyperparameter tuning. Which in the end paid off

1 Feb 2026, 13:10

Upvotes 2

CodeJoe

Let's star this Repo guys⭐. Amazing work by the team @21b and @wuuthraad! Congratulations and we really appreciate you sharing your code!

1 Feb 2026, 13:34

Upvotes 1

21db

Thanks @CodeJoe

replied to CodeJoe1 Feb 2026, 14:02

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status