🎥 Challenge Chat: Multiple entries for the same ...

Adbot Ad Engagement Forecasting Challenge

Helping South Africa

$500 USD

Completed (almost 2 years ago)

Skills you will learn

Forecast

452 joined

113 active

Info Data Chat Leaderboard

Start

Apr 04, 24

May 19, 24

Reveal

May 19, 24

paradoxx

Multiple entries for the same date and same ID

Data · 7 Apr 2024, 08:40 · 7

Why are multiple entries present for the same date? How should we consider that for modelling?

The same ID has multiple values for impressions! How is it possible?

Discussion 7 answers

AdeptSchneider22

Kenyatta University

The ID has multiple values for impressions because it is daily impressions. The train.csv data contains daily impressions for each unique ID from 2020 - 2024. The challenge is time series forecasting.

7 Apr 2024, 09:21

Upvotes 1

paradoxx

I get that it's a time series forecasting challenge and that it has daily impressions.

But for 1 ID and 1 date, only 1 entry should be present, right? Or am I missing something?

For your reference, consider ID - 'ID_5da86e71bf5dee4cf5047046', it has 6 entries for date '2020-01-01'

replied to AdeptSchneider227 Apr 2024, 10:05

Upvotes 1

AdeptSchneider22

Kenyatta University

For that instance, you can add an hours column. If you look at it you'll realize the impressions were recorded on different hours.

replied to paradoxx7 Apr 2024, 10:18

Upvotes 2

Nelly43

Zindi

Hello,

Just to clarify, the training data is made up of daily entries related to clients' ads and for some dates, clients would have more than one ad on display at a time.

This information could be useful in building your model, however, the main objective of the challenge is forecasting the total number of clicks a client would get in the future.

replied to AdeptSchneider228 Apr 2024, 08:40

Upvotes 0

paradoxx

`clients would have more than one ad on display at a time.` If that's the case, then there should be an identifier for the ads. I mean why do we even have the ID field then!?

I assumed that it's a snapshot of the number of clicks at certain point in the day. And based on the EDA that I have done on the dataset, I feel my assumption is right.

Please correct me if I am wrong. Attaching an example to support the point would be really helpful.

replied to Nelly438 Apr 2024, 09:37

Upvotes 1

Nelly43

Zindi

The IDs are for each unique client on the platform and a client can run multiple ads concurrently, or at different times of day, on the same date.

You'll notice that the keyword and description lengths can be different for entries made on the same date. This can therefore be used to distinguish the unique ads and inform your model.

Features such as these are the primary reason the different ads were separated and included in the training set and you should be able to use this information in building your model.

However, it is also possible to aggregate the data for dates with multiple entries and use the totals instead since the challenge is focused on the total number of clicks.

replied to paradoxx8 Apr 2024, 11:25

Upvotes 2

paradoxx

This makes sense. Thanks a lot for the in-depth explanation and for being patient!

replied to Nelly438 Apr 2024, 11:41

Upvotes 2

Join the largest network for
data scientists and AI builders

About FAQs

Status