Congrats to the winners of this challenge ! Well Done !
Would be great if the best ranked (top 20 for instance or more if you want :D) could share some insights on the solution they have implemented (or may be a github link :D).
On our side, we have had a score of : 44.13 (rank 42 on the final leaderboard).
We used an optimization algorithm (scipy) to minimize the challenge loss function, while focusing on initialization (as we noticed that the results depended a lot on that).
We tried different initialization:
- gaussian based on the dataset distribution
- EM algorithm (Gaussian Mixtures)
Also, we did not manage to include properly the hour / day / month information into our clustering as it seems to lead to overfitting. (same for the other variables even though we did not spend much time on that actually).
Did you manage to leverage other variables ? Or is your solution only based on the "historical accidents data" ?
Thank you for sharing,
what is EM algorihm please ?
It is an optimization algorithm that is (among other use cases) used for a clustering method called "Gaussian Mixtures" (The scikit learn link : https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html)
I think there are many top places because of randomness. For example I've jumped from 83 to 7 place with very simple solution which use fixed positions for all cars for any time)))
What algorithm did you use ?
Simple genetic algorithm for minimization without any ML.
I'll check that out
Haha yes @personnon, I guess randomness plays a big role here.
However, I believe that for the 3 best solutions, it was not totally random and the teams manage to extract some meaningful insights.