Primary competition visual

Adbot Ad Engagement Forecasting Challenge

Helping South Africa
$500 USD
Completed (almost 2 years ago)
Forecast
452 joined
113 active
Starti
Apr 04, 24
Closei
May 19, 24
Reveali
May 19, 24
Multiple entries for the same date and same ID
Data Ā· 7 Apr 2024, 08:40 Ā· 7

Why are multiple entries present for the same date? How should we consider that for modelling?

The same ID has multiple values for impressions! How is it possible?

Discussion 7 answers
User avatar
AdeptSchneider22
Kenyatta University

The ID has multiple values for impressions because it is daily impressions. The train.csv data contains daily impressions for each unique ID from 2020 - 2024. The challenge is time series forecasting.

7 Apr 2024, 09:21
Upvotes 1

I get that it's a time series forecasting challenge and that it has daily impressions.

But for 1 ID and 1 date, only 1 entry should be present, right? Or am I missing something?

For your reference, consider ID - 'ID_5da86e71bf5dee4cf5047046', it has 6 entries for date '2020-01-01'

User avatar
AdeptSchneider22
Kenyatta University

For that instance, you can add an hours column. If you look at it you'll realize the impressions were recorded on different hours.

User avatar
Nelly43
Zindi

Hello,

Just to clarify, the training data is made up of daily entries related to clients' ads and for some dates, clients would have more than one ad on display at a time.

This information could be useful in building your model, however, the main objective of the challenge is forecasting the total number of clicks a client would get in the future.

`clients would have more than one ad on display at a time.` If that's the case, then there should be an identifier for the ads. I mean why do we even have the ID field then!?

I assumed that it's a snapshot of the number of clicks at certain point in the day. And based on the EDA that I have done on the dataset, I feel my assumption is right.

Please correct me if I am wrong. Attaching an example to support the point would be really helpful.

User avatar
Nelly43
Zindi

The IDs are for each unique client on the platform and a client can run multiple ads concurrently, or at different times of day, on the same date.

You'll notice that the keyword and description lengths can be different for entries made on the same date. This can therefore be used to distinguish the unique ads and inform your model.

Features such as these are the primary reason the different ads were separated and included in the training set and you should be able to use this information in building your model.

However, it is also possible to aggregate the data for dates with multiple entries and use the totals instead since the challenge is focused on the total number of clicks.

This makes sense. Thanks a lot for the in-depth explanation and for being patient!