🔬 Hot Topic: nan values in columns

AirQo Ugandan Air Quality Forecast Challenge

Helping Uganda

$5 000 USD

Completed (~6 years ago)

Skills you will learn

Forecast

909 joined

327 active

Info Data Chat Leaderboard

Start

Mar 14, 20

May 31, 20

Reveal

May 31, 20

Roman_Lents

nan values in columns

Help · 2 May 2020, 14:37 · 4

Guys, did you use any high level methods to fill nan values or just leave it as it is? There are many nans in the data and filling it using basics e.g. mean/median/mode or even some imputers as knn or mice is not really good idea as it only adds some noise and change the real statistical distribution of values.

Discussion 4 answers

kolatimiDave

University of lagos

Since the data is a hourly average of weather indicators for 5 days,(121 hours), i think a good way to deal with such NaN values is by looping through the list and averaging it on hour basis, maybe 8 hours interval or so. And also I think there are probably some missing values in the temp column as there are 0.0 values for temperature which is'nt possible in Uganda.

2 May 2020, 14:46

Upvotes 0

Krishna_Priya

It is better to not fill null values than absurdly filling with mean, median, mode in this particular problem statement. Although Some variables have a certain pattern of values. Analyze each Var separately and fill nulls accordingly if you see a pattern.

3 May 2020, 01:36

Upvotes 0

MICADEE

LAHASCOM (Freelance)

@Krishna_Priya. In fact, you are very right. I just finished my EDA on these datasets. I could see many features with very different patterns like you said. Just to be taking them one by one is my concern right now considering the fact that we have 770 numerical variables(3 discrete e.g location,min_precip,median_precip and the rest continuos) after using label encoding for variable 'location' . Now only categorical variable is the ID. Truly, stressful though, but we are getting there gradually.

replied to Krishna_Priya5 May 2020, 18:47

Upvotes 0

Krishna_Priya

@Adeyinka_Michael You are on the correct track. Yeah, this data certainly requires a bit extensive research :)

replied to MICADEE6 May 2020, 07:28

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status