My findings so far: - With LR Model raw (0 imputation), strong evidence of Heteroskadacity, if you do the Breusch-Pagan test. - continous num_cols are all right skewed (including the target), except for ad_description_len that is left skewed. - very strong evidence outliers: 3 cols more than 30,000 outliers; 3 cols more than 20,000 outliers, one col mor than 10,000 outliers and one col more than 1000 outliers. just thinking deleting all the outliers will probably half the train set. what implications does the have for preds and LB performance. - also observed most num_cols contains zeros ('0') so imputing with zero will create anomolies/bias. challenge is to develop an imputation strategy the complement and enhance your algorithm choice? Just sharing findings with peeps, happy coding and competing my fellow Zindi's. Winter is coming!!!!