Looking through the train dataset, as well as the air-pollution dataset and seasonal meteorological dataset, i noticed an irregularity, in which the train dataset has been recorded primarily from the metropolitan area and not the city itself, while the air-pollution and seasonal meteorological datasets are concentrated towards the city and not within the metropolitan area. The locations in both the air-pollution and all 4 of the seasonal datasets do not match with any of the locations in the train dataset.
Is this correct ? Is this the expectation? If yes, won't it make it difficult to combine any of the external datasets to the train dataset?
I noticed the same here, but I hope, of which am not sure of, we have to apply one of the Interpolation Techniques to creat a new training data- Inverse Distance Weighting (IDW)I feel will be effective.
The training dataset has been recorded primarily from the metropolitan area and not the city itself because there are more training points in the metropolitan area and the model can be later applied to the city itself. That is the reason why the seasonal datasets are provided only for the city.
The locations in the seasonal datasets do not match with the ones of the training dataset because the training is provided only at in-situ station points while the seasonal datasets are a regular grid of interpolated data.
the train dataset seems to have many identical entries, for example the 1sr and 2nd row, the 3rd and 4th row, is there any special reason for that or we should actually remove the repeated data?