so according to the Variabledefenitions.csv the PID is the Unique identifier of the soil sample site but in the satellite data files there are multiple longitiude,latitude values for a single PID , what does that mean, did they take the same sample from different places or what ? is it me who is wrong , if it is me please correct me zindians, and if you would like to then please share us the method u used to merge the satellite data files with the train and test files , Thank you
What's the largest difference in lat/lon for a single PID you have observed?
The largest one I could find in the Landsat data (quick check) is about 16m, so I think for practical purposes corresponds to the same area. I wouldn't worry about this. Join on PID if that is easiest for you.
-----
Calculation:
We know Earth's circumference is 40,075,000m and there are 360 degrees in a circle. Thus each degree of lattitude is 40,075,000m / 360 degrees = 111,320 m/degree. The largest change in latitude I observed for the same PID was 0.000152 degrees, which converted to metres would be 0.000152 degrees * 111,320 m/degree = 16.9m.
you should consider both longitude and latitude the distance doesnt seem to be as small as 17 meters ,try it but if im wrong pls let me know
I think the reason you are seeing these discrepancies is because the satellite data comes from pixels that normally have resolution between 30m-1km. Likely what has happened is that Zindi has taken the centroid of each pixel from the remote sensing data and used that for the table. The PID is probably just the closest target ID to that centroid. Thus the linked data is the best available sateillte data for the target.