Congrats to the soon to be winners of this challenge, it's an interesting problem , first of a kind for me , and the biggest data i've had to deal with ever. I thank Zindi for bringing it up to this platform, to JohnWhitaker for making it easier for us to start in it. ( I'm bad with dataframes , thanks John ) and to my teammates for making this a learning process.
I'm both relieved that it's over, and frustrated that i didn't finish where i wanted to finish.
However, as someone who's active on this platform, and will be in the future, i'd like to point out one last thing to Zindi.
The leak thing was a disaster, surely by announcing that anyone using the leak will be disqualified or his solution won't be taken into consideration it removed many submissions that were taking advantage of that leak, there's another way that the leak can and will be used, and that's something Zindi cannot probably detect..
You can perform for example extra tuning on your model to predict the count of the observations in Vehicles2016_2019 per (date,hour) and that's a leak usage. And we hope such errors can be caught early on from Zindi, or appropriate measures are put in place to prevent anyone from abusing such errors.
PS : this is not an effort to disqualify anyone. It's just something me and teammates realized right after the leak, and we decided not to use ( Code will be shared on Github ). It's also a reminder to Zindi that such mistakes can ruin a whole challenge, and also the final product for the sponsor.
Thank you too