Sorry for a possibly dumb question. The field id provided has values of 0 also. So those are to be ignored ?
On running this code:
import rasterio
import numpy as np
from rasterio.plot import show
for i in range(4):
fp_path1=f'/content/drive/My Drive/ICLR_SEG/0{i}/{i}_field_id.tif'.format(i)
raster1 = rasterio.open(fp_path1)
band1=raster1.read(1)
print(f'Total number of fields in tile {i} is:'.format(i),np.unique(band1[np.where(band1!=0)]).shape[0])
The output is:
Total number of fields in tile 0 is: 591
Total number of fields in tile 1 is: 757
Total number of fields in tile 2 is: 256
Total number of fields in tile 3 is: 3084
Adding these the total field images are coming to be 4688. That is less than 4797 which is mentioned. What am I missing?
Would request your kind reply?
Hmm, I hadn't noticed the discrepancy. To be honest I haven't looked much at this challenge yet. I had a few missing predictions (must be the missing fields) and just filled the 34 missing values with 1/7 ¯\_(ツ)_/¯
I'll see what's up on Monday - for now, I'd say just ignore those few missing ones.
Thanks a lot for your prompt response
Thanks Aninda for raising this. I'm seeing the same issue of getting 4,688 as the total for unique Field IDs (except 0).
In the test set, there are these 34 missing unique field IDs in the combined field_id.tif files compared what's listed in the SampleSub.csv:
[64, 140, 274, 354, 784, 917, 985, 1401, 1516, 1581, 2092, 2307, 2546, 2555, 2562, 2812, 3084, 3144, 3318, 3458, 3838, 3965, 3966, 3990, 4096, 4098, 4212, 4279, 4280, 4351, 4361, 4366, 4709, 4778]
Assuming the complete set of Field IDs is meant to range from 1 to 4797, the other 75 missing IDs in the train set:
[ 7, 21, 48, 113, 266, 275, 349, 530, 1062, 1179, 1338, 1445, 1459, 1684, 1766, 1777, 1796, 1903, 1928, 1937, 2042, 2076, 2091, 2120, 2121, 2590, 2599, 2735, 2792, 2937, 2951, 2973, 3043, 3050, 3077, 3157, 3228, 3339, 3346, 3363, 3388, 3390, 3496, 3519, 3534, 3569, 3588, 3591, 3645, 3659, 3685, 3763, 3825, 3831, 3855, 3887, 3892, 3958, 4043, 4063, 4095, 4114, 4199, 4285, 4304, 4317, 4318, 4338, 4375, 4376, 4446, 4528, 4603, 4689, 4704]
hmm, also to add to the above, is it correct that most of the 'fields' are only a few pixels in size? When I was looking at the field_id tile it seems there's a lot of pixels with id 0 and for ids > 0 there are usually only around <10 pixels for each...?
Thanks Dave for bringing this out. I am little bit confused with
Thanks wwymak for this. This is another thing in data which was not very clear to me. Looking forward for someone in the leaderboard to kindly explain.
We are looking into this and will share an update soon.
We have the correct number for fields in the vector layers of ground reference data, and that’s where we generated the field IDs list from (for both training and test). But some of the fields have a very narrow and long shape (while the area is large) and during rasterization they get no pixels. The width of the fields are less than a pixel of Sentinel-2. This caused the fields to disappear in the data shared with users. So our conclusion was to drop them from the list of train and test as they cannot be mapped to Sentinel-2 grid.
There are now 3,286 fields in the train and 1,402 fields in the test.
Thank you for bringing this to our attention and good luck with the challenge!