First of all, we do like to thank Zindi for this wonderfull, challenging and very inspiring challenge and congrats to all winners and competitors.
Here is the summary of my team's winning solution.
Our first intuition was to find best 8 static locations to place along the day, thus 6 ambulance locations to place for each intervall hour 3H of a day, and then optimize it as much as possible the distance according to the scoring function for the leaderboard.
The real challenges for us was:
Grouping crashes per 3H didn't help cause that leads to overfitting. Thus optimizing according to all the crash locations after removing outliers was better. But that didn't give the real best score.
After some analysis of the weather data, We have taken only those accidents that are in similar weather condition as the time interval in the submission to priorize the test set, and in order to conserve some information from the whole dataset, used kmeans to add some plausible/representative crash locations when k is the best size of the representative crash locations.
For each interval 3H, we used a combinaison of initial location, all got from kmeans.
All initial locations got by each intervall gave a good score but not the best. After some analysys, we constate a groups of hour that have similar number of crashes, then for each group we set the same initial location. Apart from the initials obtained from each interval, we also used ambulance location obtained from the crash locations for the optimization.
Finally, we minimize the distance sum from the scoring function using gradient descent.
If some of you have some question, we will be happy to answer you in the comment section or in private discussion.
Wow. Great work
Your solution is the best since you got the first place.
Thank you very much for sharing!