⚠️ Trending Now: SPLIT OF DATA

Urban Air Pollution Challenge by #ZindiWeekendz

Helping Africa

$300 USD

Completed (almost 6 years ago)

Skills you will learn

Prediction

236 joined

134 active

Info Data Chat Leaderboard

Start

Apr 10, 20

Apr 12, 20

Reveal

Apr 12, 20

Lawrence_Moruye

SPLIT OF DATA

Notebooks · 11 Apr 2020, 09:28 · 1

I'm trying to understand how the train and test datasets have been split.I think place_id indicates the different regions within which the dataset has been collected.TRain has got 349 different regions and test has got 179 different regions.Regions in train are not included in test.So it means we are building a model using data from different regions and applying that model to predict pollution on other regions.Assuming this regions indicate different cities in Africa.What is the probability that a model trained using Tunis data will accurately forecast air pollution in Mogadishu?Or what is the relationship between the regions since the data doesn't include geolocation data?I think I must have misundersytood something....

Discussion 1 answer

Olayinka_Fadahunsi

I think focus should be given on the readings. This is why the model performs worse when fed with location data

11 Apr 2020, 13:19

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status