
I'm trying to understand how the train and test datasets have been split.I think place_id indicates the different regions within which the dataset has been collected.TRain has got 349 different regions and test has got 179 different regions.Regions in train are not included in test.So it means we are building a model using data from different regions and applying that model to predict pollution on other regions.Assuming this regions indicate different cities in Africa.What is the probability that a model trained using Tunis data will accurately forecast air pollution in Mogadishu?Or what is the relationship between the regions since the data doesn't include geolocation data?I think I must have misundersytood something....
I think focus should be given on the readings. This is why the model performs worse when fed with location data