Anyone who has been able to use the data from the uber movement? I am wondering the link between the data and the Mobiticket data. How do we combine the two to predict the number of tickets.
You can download average trip times for a given route from movement.uber.com. It comes as a CSV file. You can only get 3 months as a time so you'll need to download several files and concatenate them. There is a date column - to integrate this data with the ticket data I merge this dataset with one generated from the competition data, merging on the 'Date' column. Depending on how you do this you can end up with the mobiticket data plus an extra column with the average trip time for that day in Nairobi. I didn't see too much improvement in my predictions when adding this information but this could be due to poor route choice or some other effect. A good next step would be to get the average trip time for multiple different routes in Nairobi and combine into a better overall indicator of traffic.
Which csv are you downloading?, The one I have has no date column. Or are you selecting a specific destination and source?
I am picking a a specific route - the resultant csv file has a 'Date' column and a 'Daily Mean Travel Time (Seconds)' column (plus many more).
I've not got to it yet, but the challenge info suggests that there might be a correlation (positive or negative) between traffic congestion and ticket sales. I'm also working on the assumption that since uber movement data is historical data, you cannot use data you downloaded from there from the test period, since at the time of prediction, that data will not yet be available.
[I replied to a cached version of this page, so editing to agree with everything @Johnowhitaker said :)]
I think that Uber Movement data is provided in case you wnat to perform some feature engineering. Create some extra features from that data in order to improve your model.