⚠️ Challenge Chat: Variable Definition

GeoAI Challenge for Air Pollution Susceptibility Mapping by ITU

Helping Italy

$1 000 USD

Completed (almost 3 years ago)

Skills you will learn

Forecast

223 joined

35 active

Info Data Chat Leaderboard

Start

Jul 21, 23

Oct 14, 23

Reveal

Oct 14, 23

Juliuss

Freelance

Variable Definition

Data · 23 Jul 2023, 21:42 · 12

@zindi says:

"Test resembles Train.csv but without the target-related columns. This is the dataset on which you will apply your model to."

Upon scrutiny, the test has only 1 column while train has 62 columns. Am I missing something here?

A variable definition file to explain the various datasets and column meaning would be important here. Any help @Zindi ?

Discussion 12 answers

xiaoironman

Had the same question! Thanks for asking! And also, what does the ImageID mean? And the following 8 digits... is it a special code or something?

23 Jul 2023, 21:44

Upvotes 0

Juliuss

Freelance

Yea, this is another issue.. supposing that the test file provided is not the correct one and the correct one is eventually uploaded, maybe the ID column will resemble that for submission file and remove the incosistency

replied to xiaoironman23 Jul 2023, 21:49

Upvotes 0

apugliese

Thank you for pointing this out. The previous test.csv file was incorrect, now it has been updated. Nevertheless, the new file has 4 columns: ID, lat, lng, and season. This was done with the purpose of making the constestants decide and design how they are going to use the training data (either the provided one or other open data).

27 Jul 2023, 08:41

Upvotes 0

Juliuss

Freelance

Thank you for the response. This is well noted

replied to apugliese27 Jul 2023, 09:05

Upvotes 0

sarthak_mehra

Hi, I am still unsure about the test data. Are we supposed only to use lat, long and season to build the model? This is very confusing...

replied to apugliese3 Sep 2023, 16:52

Upvotes 0

apugliese

Hello, you could but you are not supposed to, the idea is to use the meteorological and topographic data provided as well as other open data that you may find useful for that purpose. The test columns are provided as lat, lng, and season to encourage the previous statement.

replied to sarthak_mehra18 Sep 2023, 08:18

Upvotes 0

cephars

Free Lance

I have not tried it, but iam starting to see the logic..I guess the other datasetsare meant to be our test dataset.I hope this is what you guys did

replied to apugliese3 Oct 2023, 11:25

Upvotes 0

Juliuss

Freelance

👏🏾How are you guys getting a perfect score @xiaoironman and @yanteixeira?

12 Aug 2023, 08:27

Upvotes 0

xiaoironman

I think the season info plays an important role here, but also might be due to luck. Maybe when tested with the rest 80% of test data, the accuracy will be lower, so I'm not quite sure.

replied to Juliuss13 Aug 2023, 12:22

Upvotes 0

Juliuss

Freelance

If your model gets a perfect score on the 20% it most probaby means it is generalizing well on unseen data in most cases...hopefully there is a balanced distribution in this tiny test data. Will try to catch up with you xiaoironman :)

Good luck

replied to xiaoironman13 Aug 2023, 13:43

Upvotes 0

cephars

Free Lance

I have seen you guys on the leaderboard,,How did you resolve this?

3 Oct 2023, 11:12

Upvotes 0

cephars

Free Lance

I mean the test dataset?

replied to cephars3 Oct 2023, 11:14

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status