Traffic Jam: Predicting People's Movement into Nairobi
$12,000 USD
Uber and Mobiticket team up to predict demand for public transportation into Nairobi
6 September 2018–13 January 2019 23:59
576 data scientists enrolled, 204 on the leaderboard
Exploratory Data Analysis
published 29 Oct 2018, 11:50

Hi all,

I did some EDA on the data as I considered some features I wanted to use to build a model. It's all very rough and I look forward to hearing your thoughts on what might be improved.

It is impressive.

One question. Why didn't you use the to_datetime() function while extrating 'hour_booked'?

I didn't think to:D It shouldn't make a difference though, or does it?

I'm not sure myself but running indexing the hour part in the converted travel_time field should give you the hour.

Something like this:

df["travel_time"] = pd.to_datetime(df["travel_time"],infer_datetime_format=True)

df["hour_booked"] = df["travel_time"].dt.hour

this line of code

for x in bpf.index: b.loc[b['ride_id'].isin([x]), 'p_filled'] = bpf[x]

what are you actually calculating?

Wow, I should put more comments in my code. Took me too long to figure out what I was doing. :D

Here I was creating a variable called 'p_filled' that is the percentage the ride was filled to before it left. Hope that makes sense. Let me know if you have more questions

okay, thanks