Primary competition visual

AirQo African Air Quality Prediction Challenge

$3 000 USD
Completed (over 1 year ago)
Prediction
1029 joined
514 active
Starti
Mar 15, 24
Closei
Jun 16, 24
Reveali
Jun 16, 24
User avatar
yanteixeira
Latitude and Longitude
Help · 23 May 2024, 19:59 · 5

Hello fellow Zindians,

I would like to know how you all are dealing with these two features. I have a feeling that it is not correct to treat them as numerical features because GBDTs split one feature at a time. This univariate splitting can miss the complex interaction between latitude and longitude that represents true geographic proximity. However, at the same time, I have not yet found a strong reason not to use them. So far, I have combined the two into one categorical feature.

I have tried countless transformations and new features, but none have convinced me.

Discussion 5 answers
User avatar
Gabriel_Figueiro

I think that the coordinates caused the model to memorize the patterns of the cities, but when we try to predict on the testing set, it doesn't work because there are different cities.

24 May 2024, 00:51
Upvotes 1
User avatar
yanteixeira

Good answer. I think the same applies to other features as well.

User avatar
Mugisha_

Given that the model will be applied to other locations at inference time, it generally doesn't make sense to train with any location based features even though the data curated seems to encourage it.

On the other hand there's isn't that much pollutant data to usefully train a model to predict pm2_5 concentrations solely relying on pollutant features: so training with latitude and longitude based features is what yields better scores for me.

24 May 2024, 23:01
Upvotes 1
User avatar
yanteixeira

Funny to see that other participants are also experiencing this dilemma. Super interesting competition so far!

Yes I also agree with your point. I also think we need other countries in the train set