Full transparency, without grabbing a ton of external datasets, I doubt it is possible to get good scores. So in order to be transparent, let me give a rundown of all the datasets I grabbed/used. I am not sure they are all useful or even necessary - and these are just the raw features, some light feature engineering comes on top of this.
Raw inputs contain: precipitation, relative humidity, temperature, station metadata (lat/lon/elevation/install height/country).
Cosine of solar zenith angle, extraterrestrial radiation (ETR), solar elevation, hour angle, declination — all derivable from timestamp + lat/lon. One could also use pvlib, but not required really; closed-form formulas are accurate to <1%.
Source: https://archive.open-meteo.com/v1/archive (pre-interpolated to station coordinates, hourly).
Variables used: shortwave/direct/diffuse radiation, cloud cover (total + low/mid/high), wind speed/direction at 10 m, CAPE, precipitation, dewpoint, surface pressure.
Three products from https://datalsasaf.lsasvcs.ipma.pt/PRODUCTS/MSG/, all covering the full African disk:
To extract per-station data: use K nearest grid cells (matched once via KDTree). Bounding-box subsetting keeps per-file I/O cheap. These datasets are HUGE (~1TB total).
Source: NASA POWER API (MERRA-2 reanalysis with GEOS satellite correction, ~50 km grid, hourly).
Variables: all-sky/clear-sky GHI, direct normal & diffuse irradiance, clearness index, cloud fraction, AOD @ 550 nm, bias-corrected precipitation.
Source: NASA GES DISC, M2T1NXAER collection (0.5° × 0.625° grid, hourly)
Variables: speciated AODs at 550 nm — total, dust, organic carbon, black carbon, sulfate, sea-salt — plus Ångström exponent and PM2.5 surface concentrations for dust & organic carbon.
Source: Copernicus Atmosphere Monitoring Service EAC4 reanalysis (3-hourly)
Variables: same speciated AOD set as MERRA-2 (total/dust/OM/BC/sulfate/sea-salt) plus total column water vapor
Pre-computed on the same 15-min station grid
Variables: apparent zenith/elevation, relative & absolute airmass, extraterrestrial radiation, Linke turbidity, and clear-sky GHI/DNI/DHI from both the Ineichen-Perez and Haurwitz models, plus the Ineichen clear-sky index
can you confirm if this score: Abs MBE - 0.160338816 is also from the above pipeline? If not does it comply with the below rule?
The values in TargetMBE and TargetRMSE should be identical for each corresponding entry of the submission. This format is required for multi-metric evaluation.
MBE is a whole different issue; see my other post on this, that unfortunately went unanswered by the organizers. I have very strong opinions on why the MBE scoring is calibrated incorrectly.. but that's for another discussion. You can get ~0 MBE without using any training data at all, you just have to guess the per-station means :)
These datasets should get RMSE down to below 60. Throw in a few more ensembles and all the tricks you can think of, it can go down to ~57-58.
Great post!
One dataset that's worth flagging since it didn't make your list: CAMS Solar Radiation Time-Series from Copernicus ADS (cams-solar-radiation-timeseries). That's a different product from the CAMS EAC4 aerosols you mentioned - it gives you 15-min all-sky and clear-sky GHI/BHI/DHI/BNI plus a reliability flag, pre-computed at the exact station coordinates. It's available at native 15-min cadence so no resampling needed to match the target grid, and it covers the full 2016-2020 span the competition spans. Worth a look if you're squeezing the last RMSE points.
On MBE - I'll skip that one, the scoring side of it deserves its own thread and I don't want to muddy yours. Looking forward to the post-deadline write-ups.
I'm late to reply, but yep, this is a really really useful dataset! Thanks for posting it!
But I have doubts. Do we get a higher score by predicting the "biased" station values in the test set or do we need to get the "true" values in the test set to get a higher score? I guess we needed to predict the "biased" station values in the test set to get a higher score...
I think the external are necessary to get good score but without good feature selection is needed then predicting the biased will be good idea I think so