There are many missing PIDs in the Sentinel-2 DataFrame. When I try to find the closest point using longitude and latitude, the closest match for some missing points is too far away. Why is there such a large difference between the training dataset and the Sentinel-2 data? I need help understanding this issue.
There are lots of issues with the data in this contest. I don't think using lat/lon distance will be useful since the coordinates don't match the info given on the overview page. See this discussion; [location related query](https://zindi.africa/competitions/amini-soil-prediction-challenge/discussions/26202).
You can use the landsat data instead, as it is very similar to sentinel..it has enough matching IDs in the training and test set