We hope you’re managing to work smoothly with the video data. We’ve made a few important updates to help you get the most out of this challenge:
Prediction window updated
This challenge is tough — we’ve adjusted the prediction window to 5 minutes into the future. You will now have the following structure: Training data → Test input → 2-minute embargo (operational lag) → 5-minute test output.
Inference data clarified
The inference dataset is now explicitly labelled as test_input, so you know exactly what to use for inference.
Clarifying the “No Backpropagation” rule
As stated in the challenge design, backpropagation should not be used in training or inference in a way that uses future information to predict the past. Your solution should operate in real time, meaning:
Backpropagation within a training loop (i.e. updating model weights during normal training) is, of course, allowed — just keep the real-time deployment context in mind.
This means you should not use future data to predict the pass as this would not be possible in the real world.
Data sharing and licensing update
Videos may now be uploaded to Kaggle, and the dataset license has been updated to CC BY 4.0.
These updates are designed to keep the competition fair, transparent, and fun for everyone.
If you’ve already started working, please re-download the updated files on Zindi (note the video data on the bucket has NOT changed), review the latest challenge description before resubmitting your solution.
Thank you for your patience and all the great feedback so far — and good luck!
I suspected this is what was meant by backpropagation being disallowed (this would be a data leakage issue). Thanks!
How are you doing with the error metric ;) It's an interesting one!
Wise choice for the competitions! I think in the real world ROI is the true metric so whatever works for the sponsors to maximize value from the solution!
What is the name of the dataset at Kaggle
Training data → Test input → 2-minute embargo (operational lag) → 5-minute test output.
Does it mean that each video has a time limit of 2 minutes for reasoning?
Hi, you have training data to split and build as you see fit.
Then you have 15 segements of test input to a "forecasting" model to predict/forecast the 5 minute test output.
In a perfect, instanteous world we would have let you start forecasting/predicting from minute 16, but as always, there is inference time, video processing and storage delays that need to be taken into consideration. This is to say, that inference should take as fast as possible because 2-minutes operational lag is also for the video processing and storage access/saving delays.
So you are predicting/forecasting from minute 18-23. I hope this helps a bit.
Hi Amy
What is the expectation in the real-time application of the model?