@zindi says:
"Test resembles Train.csv but without the target-related columns. This is the dataset on which you will apply your model to."
Upon scrutiny, the test has only 1 column while train has 62 columns. Am I missing something here?
A variable definition file to explain the various datasets and column meaning would be important here. Any help @Zindi ?
Had the same question! Thanks for asking! And also, what does the ImageID mean? And the following 8 digits... is it a special code or something?
Yea, this is another issue.. supposing that the test file provided is not the correct one and the correct one is eventually uploaded, maybe the ID column will resemble that for submission file and remove the incosistency
Thank you for pointing this out. The previous test.csv file was incorrect, now it has been updated. Nevertheless, the new file has 4 columns: ID, lat, lng, and season. This was done with the purpose of making the constestants decide and design how they are going to use the training data (either the provided one or other open data).
Thank you for the response. This is well noted
Hi, I am still unsure about the test data. Are we supposed only to use lat, long and season to build the model? This is very confusing...
Hello, you could but you are not supposed to, the idea is to use the meteorological and topographic data provided as well as other open data that you may find useful for that purpose. The test columns are provided as lat, lng, and season to encourage the previous statement.
I have not tried it, but iam starting to see the logic..I guess the other datasetsare meant to be our test dataset.I hope this is what you guys did
👏🏾How are you guys getting a perfect score @xiaoironman and @yanteixeira?
I think the season info plays an important role here, but also might be due to luck. Maybe when tested with the rest 80% of test data, the accuracy will be lower, so I'm not quite sure.
If your model gets a perfect score on the 20% it most probaby means it is generalizing well on unseen data in most cases...hopefully there is a balanced distribution in this tiny test data. Will try to catch up with you xiaoironman :)
Good luck
I have seen you guys on the leaderboard,,How did you resolve this?
I mean the test dataset?