User avatar
Bayero university kano
Preprocessing
Help · 11 Sep 2020, 14:01 · 8

Hi, is there anyway one can handle those missing values without dropping any column?

Discussion 8 answers
User avatar
Sayrikey1
University of Lagos

Fill in them, with probably mean, mode, median or zeros.

11 Sep 2020, 14:08
Upvotes 0

You can fill in with zeros (not advisable since some of the features already have zeros), fill with mean, mode, median instead. If you are using tree-based models, imputing missing values is not necessary as they can handle it themselves

11 Sep 2020, 14:19
Upvotes 0

Do you know how someone can fill all columns with the mean with one of code or more efficient way than for loop?

I usually use df['column_name'].fillna(df.column_name.mean())

User avatar
Sayrikey1
University of Lagos

df.fillna(df.mean())

User avatar
Federal university of technology minna

You can use the class provided in sklearn.impute

from sklearn.impute SimpleImputer

imputer = SimpleImputer(strategy='mean') >>> set strategy to mean if you want to fill with mean

imputer.fit(df)

filled_df = imputer.transform(df)

You can also forward fill(ffill) or/and backward fill(bfill) e.g df.fillna(method='ffill). check https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html for more

12 Sep 2020, 02:08
Upvotes 0

If I want to fill all the columns with the same filling method, say fill with mean, I sometimes write function and use aggregate to apply it.

def fill_with_mean(col):

return col.fillna(col.mean())

You can then apply it to the whole columns of the dataframe (df) by writing:

df.agg(fill_with_mean)

12 Sep 2020, 12:25
Upvotes 0