Used the mean for the numerical columns, then I dropped the missing values in the categorical column... Didn't want to use the mode in the categorical column... What are your thoughts?
I thought of maybe using an algorithm to predict the missing values,maybe linear regression or random forest regressor for numerical values then maybe a classifier algorithm for categorical..or maybe one hot categorical features first then use regressor algorithm to predict missing values..I believe filling the missing values appropriately was the main task for this project..Or even using clustering techniques like knn,,I dont think using stats techniques like mean,mode,median was appropriate for this task since there are many missing values.....my thoughts tho
I used mean fot the numerical columns and got insight to replace some missing values that satisfy a certain condition for the title column e.g most houses with the price of 3 million and above are mansions, i filled the missing value in that codition with mansion . I used that for the title column ,droped the missing values for loc column
which missing data? at branch level or at daily level
Used the mean for the numerical columns, then I dropped the missing values in the categorical column... Didn't want to use the mode in the categorical column... What are your thoughts?
I thought of maybe using an algorithm to predict the missing values,maybe linear regression or random forest regressor for numerical values then maybe a classifier algorithm for categorical..or maybe one hot categorical features first then use regressor algorithm to predict missing values..I believe filling the missing values appropriately was the main task for this project..Or even using clustering techniques like knn,,I dont think using stats techniques like mean,mode,median was appropriate for this task since there are many missing values.....my thoughts tho
Look into mice. Multiple Imputation by Chained Equations(MICE), it can be very useful
Thanks
I used mean fot the numerical columns and got insight to replace some missing values that satisfy a certain condition for the title column e.g most houses with the price of 3 million and above are mansions, i filled the missing value in that codition with mansion . I used that for the title column ,droped the missing values for loc column