Primary competition visual

GeoAI Challenge for Air Pollution Susceptibility Mapping by ITU

Helping Italy
$1 000 USD
Challenge completed ~2 years ago
Forecast
220 joined
35 active
Starti
Jul 21, 23
Closei
Oct 14, 23
Reveali
Oct 14, 23
User avatar
Juliuss
Freelance
Variable Definition
Data · 23 Jul 2023, 21:42 · 12

@zindi says:

"Test resembles Train.csv but without the target-related columns. This is the dataset on which you will apply your model to."

Upon scrutiny, the test has only 1 column while train has 62 columns. Am I missing something here?

A variable definition file to explain the various datasets and column meaning would be important here. Any help @Zindi ?

Discussion 12 answers

Had the same question! Thanks for asking! And also, what does the ImageID mean? And the following 8 digits... is it a special code or something?

23 Jul 2023, 21:44
Upvotes 0
User avatar
Juliuss
Freelance

Yea, this is another issue.. supposing that the test file provided is not the correct one and the correct one is eventually uploaded, maybe the ID column will resemble that for submission file and remove the incosistency

Thank you for pointing this out. The previous test.csv file was incorrect, now it has been updated. Nevertheless, the new file has 4 columns: ID, lat, lng, and season. This was done with the purpose of making the constestants decide and design how they are going to use the training data (either the provided one or other open data).

27 Jul 2023, 08:41
Upvotes 0
User avatar
Juliuss
Freelance

Thank you for the response. This is well noted

Hi, I am still unsure about the test data. Are we supposed only to use lat, long and season to build the model? This is very confusing...

Hello, you could but you are not supposed to, the idea is to use the meteorological and topographic data provided as well as other open data that you may find useful for that purpose. The test columns are provided as lat, lng, and season to encourage the previous statement.

User avatar
cephars
Free Lance

I have not tried it, but iam starting to see the logic..I guess the other datasetsare meant to be our test dataset.I hope this is what you guys did

User avatar
Juliuss
Freelance

👏🏾How are you guys getting a perfect score @xiaoironman and @yanteixeira?

12 Aug 2023, 08:27
Upvotes 0

I think the season info plays an important role here, but also might be due to luck. Maybe when tested with the rest 80% of test data, the accuracy will be lower, so I'm not quite sure.

User avatar
Juliuss
Freelance

If your model gets a perfect score on the 20% it most probaby means it is generalizing well on unseen data in most cases...hopefully there is a balanced distribution in this tiny test data. Will try to catch up with you xiaoironman :)

Good luck

User avatar
cephars
Free Lance

I have seen you guys on the leaderboard,,How did you resolve this?

3 Oct 2023, 11:12
Upvotes 0
User avatar
cephars
Free Lance

I mean the test dataset?