Hi, is there anyway one can handle those missing values without dropping any column?
Fill in them, with probably mean, mode, median or zeros.
You can fill in with zeros (not advisable since some of the features already have zeros), fill with mean, mode, median instead. If you are using tree-based models, imputing missing values is not necessary as they can handle it themselves
Do you know how someone can fill all columns with the mean with one of code or more efficient way than for loop?
I usually use df['column_name'].fillna(df.column_name.mean())
df.fillna(df.mean())
Ok. Thanks
You can use the class provided in sklearn.impute
from sklearn.impute SimpleImputer
imputer = SimpleImputer(strategy='mean') >>> set strategy to mean if you want to fill with mean
imputer.fit(df)
filled_df = imputer.transform(df)
You can also forward fill(ffill) or/and backward fill(bfill) e.g df.fillna(method='ffill). check https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html for more
If I want to fill all the columns with the same filling method, say fill with mean, I sometimes write function and use aggregate to apply it.
def fill_with_mean(col):
return col.fillna(col.mean())
You can then apply it to the whole columns of the dataframe (df) by writing:
df.agg(fill_with_mean)
Fill in them, with probably mean, mode, median or zeros.
You can fill in with zeros (not advisable since some of the features already have zeros), fill with mean, mode, median instead. If you are using tree-based models, imputing missing values is not necessary as they can handle it themselves
Do you know how someone can fill all columns with the mean with one of code or more efficient way than for loop?
I usually use df['column_name'].fillna(df.column_name.mean())
df.fillna(df.mean())
Ok. Thanks
You can use the class provided in sklearn.impute
from sklearn.impute SimpleImputer
imputer = SimpleImputer(strategy='mean') >>> set strategy to mean if you want to fill with mean
imputer.fit(df)
filled_df = imputer.transform(df)
You can also forward fill(ffill) or/and backward fill(bfill) e.g df.fillna(method='ffill). check https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html for more
If I want to fill all the columns with the same filling method, say fill with mean, I sometimes write function and use aggregate to apply it.
def fill_with_mean(col):
return col.fillna(col.mean())
You can then apply it to the whole columns of the dataframe (df) by writing:
df.agg(fill_with_mean)