The data consists of crashes identified by the World Bank DIME research team and by Flare. ‘Train.csv’ provides time and location for 6318 crashes in the training period (2018-01-01 to 2019-06-30).
You are asked to determine the locations for six different ambulances to be placed in order to minimize the distance to any reported crashes. Ambulances can be assigned a new location every three hours. Scoring is based on the distance from each crash in the test period to the nearest ambulance (see the Evaluation section for specifics).
Additional data is also provided:
-
Weather.csv has daily weather based on the GFS dataset
-
Segment_info.csv contains info on specific road segments. This includes information on physical characteristics such as the existence of crosswalks or obstacles in the road as well as behavioral characteristics such as people walking along the side of the road, all of which may be associated with the likelihood of a road traffic crash. The columns have been obfuscated but the data may still be useful. It can be linked to physical locations by joining with the geometry in segments_geometry.geojson. Some segments have separate rows for each side of the road, and so two rows in Segment_info.csv may map to the same road segment in segments_geometry.geojson.
You are also allowed to use data from movement.uber.com, and are encouraged to do so. For example, you can get hourly average speeds for different routes from https://movement.uber.com/explore/nairobi/speeds. This data can be mapped to OpenStreetMap ways.
Note: Please use the Uber Movement data only from the training period or earlier, i.e. do not use 'future' data.
Files available for download
-
Train.csv - contains crashes between 2018-01-01 and 2019-06-01, each of which has a location (latitude and longitude) and a time.
-
Segment_info.csv - data from road segment surveys
-
segments_geometry.geojson - geographical representations of the road segments above
-
SampleSubmission.csv - Example of submission format. See the starter notebook for examples creating your own.
-
StarterNotebook.ipynb - A basic notebook to show the basics of loading the data, re-creating the scoring method and making your first submission.
Variable definitions
Train.csv:
- Uid - a unique ID
- Datetime - the date and time a crash occurred
- Latitude and Longitude - the location of the crash (not always exact due to the nature of the data collection)
Sample Submission
- Date - a datetime column (3 hour intervals covering the test period). For example 7/1/2019 0:00 indicates the ambulance locations from 7/1/2019 0:00 to 7/1/2019 2:59.
- A[N]_Latitude and A[N]_Longitude - used to place ambulance N at a specific location
Weather Info:
- The weather data comes from the GFS dataset. Descriptions of the image bands used to generate this data can be found here.
Segment_info:
- segment_id - the unique ID of the specific road segment
- side - each road segment has up to two sides, i.e. traffic going in opposite directions
- The column headings are obfuscated, but we’ve maintained relationships: 79_76 and 79_65 are two related questions.