🎥 Must-Read: Num_cols Observations

Adbot Ad Engagement Forecasting Challenge

Helping South Africa

$500 USD

Completed (~2 years ago)

Skills you will learn

Forecast

452 joined

112 active

Info Data Chat Leaderboard

Start

Apr 04, 24

May 19, 24

Reveal

May 19, 24

Jaw22

Zindi africa

Num_cols Observations

Help · 9 Apr 2024, 14:46 · 0

My findings so far: - With LR Model raw (0 imputation), strong evidence of Heteroskadacity, if you do the Breusch-Pagan test. - continous num_cols are all right skewed (including the target), except for ad_description_len that is left skewed. - very strong evidence outliers: 3 cols more than 30,000 outliers; 3 cols more than 20,000 outliers, one col mor than 10,000 outliers and one col more than 1000 outliers. just thinking deleting all the outliers will probably half the train set. what implications does the have for preds and LB performance. - also observed most num_cols contains zeros ('0') so imputing with zero will create anomolies/bias. challenge is to develop an imputation strategy the complement and enhance your algorithm choice? Just sharing findings with peeps, happy coding and competing my fellow Zindi's. Winter is coming!!!!

Discussion 0 answers

Join the largest network for
data scientists and AI builders

About FAQs

Status