I did some EDA on the data as I considered some features I wanted to use to build a model. It's all very rough and I look forward to hearing your thoughts on what might be improved.
It is impressive.
One question. Why didn't you use the to_datetime() function while extrating 'hour_booked'?
I didn't think to:D It shouldn't make a difference though, or does it?
I'm not sure myself but running indexing the hour part in the converted travel_time field should give you the hour.
Something like this:
df["travel_time"] = pd.to_datetime(df["travel_time"],infer_datetime_format=True)
df["hour_booked"] = df["travel_time"].dt.hour
this line of code
for x in bpf.index: b.loc[b['ride_id'].isin([x]), 'p_filled'] = bpf[x]
what are you actually calculating?
Wow, I should put more comments in my code. Took me too long to figure out what I was doing. :D
Here I was creating a variable called 'p_filled' that is the percentage the ride was filled to before it left. Hope that makes sense. Let me know if you have more questions