After a quick exploration, I found that there are 3489 images in the training directory and 1022 images in the test directory, as indicated by the s2_images.zip file. However, the train.csv and test.csv files contain 11436 and 3384 image tifPath entries, respectively, which suggests a significant number of missing image files. So, we have 3489 and 1022 images from the train and test image directories, respectively, compared to the 11436 and 3384 image tifPath entries in the train.csv and test.csv files. The question now is whether we can extract additional images beyond the 3489 and 1022 provided in the s2_images.zip file to match the number of image tifPath entries in the CSV files (11436 and 3384). If the answer is yes, will this affect the final submission file from my model? Could there be an error when uploading? Do we need to submit the submission file with the exact shape of 1022 or 3384, or can it be of any shape?
valid question. thought the submission format only took ID and class?
You raised a valid concern. I noticed that in the provided images, about three months data were obtained for each ID. Also, the IDs in the test set and provided submission file differed from the ones provided in the previous files. Now, for most people who downloaded 12 months data using the previous file may find it hard to match the IDs in the previously released file. Probably, downloading more data would require using the bbox in the .tif images? 🤔
Yeah, @Gozie @nymfree, I think any submission file shape will be allowed since we are permitted to extract more images. Also, the generalization of the final solution is the most important factor here, because we're not restricting our solution to just 'Cote d'Ivoire', the current location. Our goal is to make the solution generalizable to all West African countries.
In the original test.geojson and samplesubmission.csv we have 282 IDs. The provided S2 images are sampled once a month (a total of 12 months in 2024). 282 x 12 = 3384.
I think that one is free to sample S2 images as often as they want, but the final prediction would be done per region specified in test.geojson and each region corresponds to one ID. The final shape of the submission file should be 282x2. i.e., two columns, 282 rows.
Since the band values for months for the rainy months won't have such a big difference, you may try out filling the missing month values using linear interpolation.