The provided weather data assume that each weather station has a unique ID, X and Y values. For each day, the weather temperature (TEMP) is recorded in the dataset. The data is preprocessed by removing missing values.
There are two files provided for this competition
- Train - the train dataset that can be used for developing your model.
- Test - the test dataset for which you will predict the weather temperature. The evaluation metric will be calculated on the test dataset.
Number of records in each file
- Train has 1,449,100 rows and 5 columns
- Test has 391,661 rows and 4 columns (The target is not provided)
File format
The input train dataset is provided in a comma delimited file having the following format:
Date,LOC_ID,X,Y,TEMP
2020-09-14,8695,0.117571677905505,-0.1312422060557518,79.8 2020-12-15,1831,0.1615036231279796,-0.0282468070433892,30.2
2019-01-22,3436,0.330747584291955,0.0251439480766755,15.2
2019-03-17,7280,-0.4700605266230696,0.0493091912565244,55.2 2018-10-05,1057,-0.0971386908259038,0.0108517790792007,42.9
The Date column represents the date of the reported data. We provide 3 years of data, for each day, the weather temperature data is reported by a set of weather stations identified by their LOC_ID, X and Y coordinates. The target will be the value of the weather temperature (TEMP).