Primary competition visual

Adbot Ad Engagement Forecasting Challenge

Helping South Africa
$500 USD
Completed (~2 years ago)
Forecast
451 joined
112 active
Starti
Apr 04, 24
Closei
May 19, 24
Reveali
May 19, 24
User avatar
Jaw22
Zindi africa
Num_cols Observations
Help · 9 Apr 2024, 14:46 · 0

My findings so far: - With LR Model raw (0 imputation), strong evidence of Heteroskadacity, if you do the Breusch-Pagan test. - continous num_cols are all right skewed (including the target), except for ad_description_len that is left skewed. - very strong evidence outliers: 3 cols more than 30,000 outliers; 3 cols more than 20,000 outliers, one col mor than 10,000 outliers and one col more than 1000 outliers. just thinking deleting all the outliers will probably half the train set. what implications does the have for preds and LB performance. - also observed most num_cols contains zeros ('0') so imputing with zero will create anomolies/bias. challenge is to develop an imputation strategy the complement and enhance your algorithm choice? Just sharing findings with peeps, happy coding and competing my fellow Zindi's. Winter is coming!!!!

Discussion 0 answers