Traffic Jam: Predicting People's Movement into Nairobi
$12,000 USD
Uber and Mobiticket team up to predict demand for public transportation into Nairobi
6 September 2018–13 January 2019 23:59
687 data scientists enrolled, 204 on the leaderboard
Target Value
published 17 Nov 2018, 14:58

But seriously guys I have been looking at the dataset for a while now still very confused. where is the target value(num_of tickets). or are we supposed to generate that? and then use it for training?

Hi. Number of tickets is determined by the number of people in a specific bus, at a specific time. Which you can infer from the given data. If you cant make that bit up........ 🤷🏿‍♂️

I think your question might deserve more credit than the response by @dkaila. If everyone "makes that bit up" - trivial or otherwise, how are they going to declare a winner? I'm assuming they will in some way validate the results on the test set?

Different people are bound to make different adjustments to the data in order to purify it. Without a target your whole data to information pipeline is backward should you validate the answer assuming a target (because your target is in this case a function of the data?).

Now I haven't attempted this challenge myself, but it seems to me if 5 people calculated the response differently they could all backpropogate high accuracies but recieve ambigious scores on your test results - all due to the response itself being "trivial"?

Now I understand this should be trivial. Just remarking on the fact that the answer to this question may not be so simple as to provoke a retort -> ` If you cant make that bit up........ 🤷🏿‍♂️ `.

edited 1 minute later

The Data doesnt explicitly have a target value,you create it. Use the groupby function of the ride_id column and aggegate by count,you will get the number of tickets that were sold per id. or just follow the notebook that was shared by one of us

If you interested in R you can use dplyr

Tickets_data <- data %>%

group_by(ride_id) %>%

summarise(Total = n())

Tickets_data <- arrange(Tickets_data, ride_id)

# Merging the two data frames

merged_data <- merge(data,Tickets_data,by="ride_id")

merged_data <- merged_data[ , -c(2,3,4)]

# Finally make it unique so that you dont have duplicate ride_ids

c_merged_data <- unique(merged_data)

# You can save it as csv and use it in python

# saving it as csv so that we can read in python

write.csv(c_merged_data, file = "train_revised2.csv")

Hello Stefan, you wont have different target since we are using count of ids

Thanks for this helpfull information