Some columns in the training set contain some 'nan's. Should I drop those columns or use SimpleImputer to make them numeric before training? Please I need your detailed explanation. Thank you.
You can drop the columns for sure but there may be information loss, or impute with simple imputer when you've explored the data and then you find which method to impute, else you could be adding noise, or maybe impute manually. You can also filllna with arbitrary value like -99999, train.fillna(-99999, inplace=True).
You can drop the columns for sure but there may be information loss, or impute with simple imputer when you've explored the data and then you find which method to impute, else you could be adding noise, or maybe impute manually. You can also filllna with arbitrary value like -99999, train.fillna(-99999, inplace=True).
I want to know if it proper to drop the "TENURE' column. Please which colum require to be dropped?