3 Mar 2020, 11:59

Meet the Winners of the Uber Movement SANRAL Cape Town Challenge

Zindi is excited to announce the winners of the Uber Movement SANRAL Cape Town Challenge. The challenge attracted 738 data scientists from across the continent and around the world, with 112 data scientists on the leaderboard.

The objective of the competition was to build a machine learning model that accurately predicts when and where the next road incident will occur in Cape Town, South Africa, using historic road incident data as well as traffic data from the Uber Movement platform. The resulting model will enable South African authorities to anticipate where they will be needed next and to put measures in place that will help ensure safety on Cape Town’s roads.

The winners of this challenge are: Cobus Burger from Stellenbosch, South Africa in 1st place; Team Gusi Lebedi from Russia in 2nd place; and Team Zindi Stars from Nigeria in 3rd place.

A special thank you to the winners for their generous feedback. Here are their insights:

Cobus Burger (1st Place)

Zindi handle: cobusburger

Where are you from? Stellenbosch, South Africa

Tell us a bit about yourself:

I work as a development economist at Stellenbosch University and as a senior data scientist at Predictive Insights.

Tell us a bit about the approach you took:

I had a single lightGBM model that I tuned on the last 3 months of 2018.
Almost all the explanatory power of my model came from the camera data. Historic data was used to approximate how many cars will be using each road segment for each hour of each day of the week. These forecasts were highly correlated with traffic incidents.
I also added daily deviations from these underlying patterns, although these did not add much. Weather and holiday data added less lift than I expected but were kept in the model, nonetheless.

What were the things that made the difference for you that you think others can learn from?

1. Trends
The trends in incidents increase dramatically over time. I suspect that they gradually changed the way the data was being captured over the training period. This meant that the data for 2018 was going to be more representative for our 2019 test period than data from 2017 or 2016. To account for this I up-weighted more recent observations relative to older observations.
2. Camera Data
I used the camera names to distinguish between inbound and outbound cameras. I found this distinction to be very useful. I also down-weighted days for which there was no camera data since these days were informative.

Team Gusi Lebedi: Evgeny & Aleksei (2nd Place)

Zindi handle: esingildinov & LexiBender

Where are you from? Russia

Tell us a bit about yourself:

Evgeny: Graduated from MIPT with a Master's degree in Applied Maths and Physics. He currently works as SAP Consultant for IBM.
Aleksei: Is a PhD student and works full time as a Data Scientist in Neuromation (neu.ro).

Tell us a bit about the approach you took:

Evgeny: I focused on generating new features over trying different model's parameters.
Aleksei: The strongest part of our solution was creative features.

What were the things that made the difference for you that you think others can learn from?

Aleksei: It is extremely important to fix random seeds, especially PYTHONHASHSEED!

What are you looking forward to most about the Zindi community?

It will be great if the community became more active in the comments / forum

Team Zindi Stars (3rd Place)

Zindi handle: Warrie_Warrie_dsn, Olawale0254, ooluwaseunabel, opsyDSN

Where are you from? Nigeria

Tell us a bit about yourself:

I have a bachelor's degree in Computer Engineering and I currently work as a Data Analyst at Data Science Nigeria. I am passionate about figuring out puzzles. I started my Data Science journey from Kaggle's kernels and discussion forums.

Tell us a bit about the approach you took:

EDA: Using qGIS software, we did spatial analysis to understand the road network and figure out the major cause of incidents in the different road segments. We also did some analysis using Tableau Public. This time around we tried to understand the frequency of occurrence of incidents for each month.
Feature Engineering: We generated and tried a lot of features ranging from time-based features, value count of SegementId, mean latitude and longitude of the segmentId, KNN classifier on the segmentid using 7 neighbour and did a lag on those neighbours (we ended up not using it because it reduced our F1 score), weather data, reverse geocode the latitude and longitude of segmentId to get the suburb.
Local Validation Strategy: We used a Time Series Split with 10 folds from the sklearn library.
Modelling: We trained our data using a GCP machine. Our final model was an ensemble of a CatBoost Classifier and a LightGbm Classifier. Since the data was highly imbalanced, we optimised our F1 score by reducing the probability threshold to 0.065.

What were the things that made the difference for you that you think others can learn from?

Taking time to explore and understand the data was key. Handling outliers is very important, especially when dealing with time-series problems.

What are the biggest areas of opportunity you see in AI in Africa over the next few years?

The application of AI in the transportation industry in Africa would be a tremendous boost to business, and in the health sector to eradicate the spread of disease.

This competition was hosted by Uber Movement (movement.uber.com) & SANRAL (nra.co.za) in partnership with Stellenbosch University (www.sun.ac.za).

What are your thoughts on our winners' feedback? Engage via the Discussions page or leave a comment on social media.