Uber Movement SANRAL Cape Town Challenge
$5,500 USD
Predict when and where road incidents will occur next in Cape Town
806 data scientists enrolled, 134 on the leaderboard
TransportationSafetyPredictionStructured
South Africa
11 October 2019—10 February 2020
122 days

Incident data in Cape Town, South Africa has been provided by SANRAL Freeway Management System and travel times between zones in Cape Town have been provided by Uber Movement.

The aim of this challenge is to forecast if an incident will occur for each hour of each day per 500m road segment along the major roadways in Cape Town for 1 January 2019 to 31 March 2019.

Files available for download

  • train.csv - the training file containing all reported incidents from 1 January 2016 to 31 December 2018. You will use this data to train your model.
  • train_VariableDefinitions.csv - Variable definitions for train
  • SampleSubmission.csv - is an example of what your submission file should look like. Note that the variable datetime x segment_id in the submission file is Date and time "yyyy-mm-dd hh:mm:ss" + " x " + road segment id. The order of the rows does not matter, but the names of the datetime x segment_id's must be correct. The column "prediction" is your prediction. The submission file is large so please allow up to 30 minutes for your score to reflect.
  • road_segemnts.zip - shapefile to create the road network for the Western Cape. These shapefiles show the unique road segments that the train file makes reference to. Each road segment is approximately 500 meters long.
  • Uber_movement_data.zip - read Uber Movement Data below
  • SANRAL_v2.zip - read SANRAL below
  • StarterNotebook_Catboost.ipynb - starter notebook

Get started with these blog posts that include tutorials.

Uber Movement Data

Uber Movement provides historic travel time between any two points in Cape Town. Any tables that are extracted from the Uber Movement platform can be used in your model. Read more about Uber Movement here.

Uber Movement provides the data for all quarters from Q1 2016 to Q1 2019.

Within each year, for each quarter there are three files:

  • Travel Times by Hour of Day (All Days)
  • Travel Times by Day of Week
  • Travel Times by Month (All Days)

Data sets include the arithmetic mean, geometric mean, and standard deviations for aggregated travel times over the selected date-range between every zone pair in the city.

Geo boundaries available in geospatial (GeoJSON) format, including Zone IDs used in the other export options. This can be found on Uber Movement, Cape Town, Download Data, Geo boundaries.

  • cape_town_travel_zones.json

*Note for Q4 2018: This selection may be missing data from 10/26/2018 - 10/30/2018

To download from Uber Movement platform

Go to Uber Movement, Cities, Africa, Cape Town

  • Select your date range - 01/01/2016 - 03/31/2019
  • Select "Download data" at the bottom of the page

SANRAL Data

The SANRAL data was collected from the Traffic Management Centre (Goodwood, Cape Town). From 2016 to 2019.

Data contains

  • Injuries2016_2019.csv - contains the network ID, event ID (this correlates to the event ID in train), local date and time incident was created, number of injuries and injury type
  • Vehicles2016_2019.csv - contains the network ID, event ID (this correlates to the event ID in train), local date and time incident was created, vehicle type (car, minibus, truck, etc) and the vehicle colour.

Vehicle detection sensor (VDS) data

  • For each month, there is a file (sometimes less) containing the total count of vehicles per zone.
  • VDS Zones are indicated on Visual_CameraLoc&Names.pdf map and coordinates are contained in excel file called VDS_locations.xlsx
  • Raw data does not have column headings. Column headings can be found in the excel file called VariableNames_VDS.csv.

You can interact with the SANRAL traffic website here.

Weather Data

You may use weather in your model. Please suggest weather datasets that can be made available to everyone by writing to zindi@zindi.africa. We will assess whether the data should be allowed in this competition. We will post links to these new datasets below in this section and also announce them on the discussion forum. Until we respond, please assume that you are not allowed to use the dataset in your solution.

Additional Data

You may use ArcGIS and HERE data for land use and infrastructure, however, you may not use it for anything traffic-related.

Public holidays:

School terms: