Primary competition visual

Uber Movement SANRAL Cape Town Challenge

Helping South Africa
$5 500 USD
Completed (~6 years ago)
Prediction
Anomaly Detection
Forecast
859 joined
133 active
Starti
Oct 11, 19
Closei
Feb 09, 20
Reveali
Feb 10, 20
Some tutorials to get going
Notebooks · 21 Oct 2019, 12:44 · 9

Hello all,

I've written a few posts on getting going with this contest. Part 1 (https://datasciencecastnet.home.blog/2019/10/19/zindi-uberct-part-1-getting-started/) basically re-caps the starter notebook I shared earlier and is useful for getting a quick entry on the board. The second part (https://datasciencecastnet.home.blog/2019/10/21/zindi-uberct-part-2-stepping-up/) shares some next steps (adding features, using fast.ai) to boost the score (to >0.08 wthout much tweaking). Both have accompanying notebooks on Google Colab for easy duplication.

I'll be working on part 3, so please share any tips for things to include. Looking forward to questions and feedback :)

Discussion 9 answers
User avatar
Raheem_Nasirudeen
The polytechnic ibadan

Thanks for Making the notebook for a startedstarted

21 Oct 2019, 14:30
Upvotes 0

Sweet, thank you for the starter code notebook and the blog posts. They're useful so far.

6 Nov 2019, 14:24
Upvotes 0

By the way, I get an error when I try to run this part of the notebook:

locations = data.groupby('road_segment_id').mean()[['longitude', 'latitude']] locations.head(2)

The error is as follows:

KeyError: "['longitude'] not in index"

It seems that this could be a Pandas bug, resulting from the groupby function going funky. I have updated my Pandas but the error persists in the latest version. I verified that the longitude column is in the data object after loading, and that it disappears right after the groupby method is called, by running nothing but the groupby function and checking for the longitude column again. It disappears.

7 Nov 2019, 13:27
Upvotes 0

Turns out there wasn't a bug in Pandas after all, but the train.csv file has a few dirty data entries. I suppose data cleaning is inevitable, but just a heads up to anyone else who is pulling their hair out.

Here's a tip: After loading the csv file say in a dataframe called data, call data.info(). If your longitude and latitude columns are not float64 types, you are not going to have a good time.

8 Nov 2019, 09:18
Upvotes 0

Hi DevilEars. Have you been able to find your way around the dirty data entries?

No other way around than going ahead with data wrangling operations to reformat your data..

Yes, I just clean it up with data wrangling operations. I remove all the longitude entries with the value Closed, and then I change the dtype of the longitude column to float.

Awesome. Thanks for sharing

1 Dec 2019, 10:03
Upvotes 0

Hi, could you please share a pyautogui code for automating download of uber data, which you mentioned in part 3?

19 Jan 2020, 14:21
Upvotes 0