Uber Movement SANRAL Cape Town Challenge
$5,500 USD
Predict when and where road incidents will occur next in Cape Town
11 October 2019–2 February 2020 23:59
251 data scientists enrolled, 31 on the leaderboard
Some tutorials to get going
published 21 Oct 2019, 12:44

Hello all,

I've written a few posts on getting going with this contest. Part 1 (https://datasciencecastnet.home.blog/2019/10/19/zindi-uberct-part-1-getting-started/) basically re-caps the starter notebook I shared earlier and is useful for getting a quick entry on the board. The second part (https://datasciencecastnet.home.blog/2019/10/21/zindi-uberct-part-2-stepping-up/) shares some next steps (adding features, using fast.ai) to boost the score (to >0.08 wthout much tweaking). Both have accompanying notebooks on Google Colab for easy duplication.

I'll be working on part 3, so please share any tips for things to include. Looking forward to questions and feedback :)

Thanks for Making the notebook for a startedstarted

Sweet, thank you for the starter code notebook and the blog posts. They're useful so far.

By the way, I get an error when I try to run this part of the notebook:

locations = data.groupby('road_segment_id').mean()[['longitude', 'latitude']] locations.head(2)

The error is as follows:

KeyError: "['longitude'] not in index"

It seems that this could be a Pandas bug, resulting from the groupby function going funky. I have updated my Pandas but the error persists in the latest version. I verified that the longitude column is in the data object after loading, and that it disappears right after the groupby method is called, by running nothing but the groupby function and checking for the longitude column again. It disappears.

Turns out there wasn't a bug in Pandas after all, but the train.csv file has a few dirty data entries. I suppose data cleaning is inevitable, but just a heads up to anyone else who is pulling their hair out.

Here's a tip: After loading the csv file say in a dataframe called data, call data.info(). If your longitude and latitude columns are not float64 types, you are not going to have a good time.