💰 Data Talk: Preprocessing

Aslim

Bayero university kano

Preprocessing

Help · 11 Sep 2020, 14:01 · 8

Hi, is there anyway one can handle those missing values without dropping any column?

Discussion 8 answers

Sayrikey1

University of Lagos

Fill in them, with probably mean, mode, median or zeros.

11 Sep 2020, 14:08

Upvotes 0

Leo

You can fill in with zeros (not advisable since some of the features already have zeros), fill with mean, mode, median instead. If you are using tree-based models, imputing missing values is not necessary as they can handle it themselves

11 Sep 2020, 14:19

Upvotes 0

BizzyVinci

Do you know how someone can fill all columns with the mean with one of code or more efficient way than for loop?

I usually use df['column_name'].fillna(df.column_name.mean())

replied to Leo12 Sep 2020, 02:12

Upvotes 0

Sayrikey1

University of Lagos

df.fillna(df.mean())

replied to BizzyVinci12 Sep 2020, 06:52

Upvotes 0

BizzyVinci

Ok. Thanks

replied to Sayrikey112 Sep 2020, 10:34

Upvotes 0

Omotade

Federal university of technology minna

You can use the class provided in sklearn.impute

from sklearn.impute SimpleImputer

imputer = SimpleImputer(strategy='mean') >>> set strategy to mean if you want to fill with mean

imputer.fit(df)

filled_df = imputer.transform(df)

replied to BizzyVinci14 Sep 2020, 06:05

Upvotes 0

BizzyVinci

You can also forward fill(ffill) or/and backward fill(bfill) e.g df.fillna(method='ffill). check https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html for more

12 Sep 2020, 02:08

Upvotes 0

Set_2011_orphan

If I want to fill all the columns with the same filling method, say fill with mean, I sometimes write function and use aggregate to apply it.

def fill_with_mean(col):

return col.fillna(col.mean())

You can then apply it to the whole columns of the dataframe (df) by writing:

df.agg(fill_with_mean)

12 Sep 2020, 12:25

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status