First off, congratulations to the winners! I really struggled with this competition so I understand how hard it is to pull off a good score.
I created a repo for my solution here: https://github.com/trentpark8800/zindi-hydropower-challenge/tree/main
In the end, my approach was to:
- Use DuckDB to load the large file onto disk rather than RAM.
- Aggregate all data to a daily granularity (both the power production and climate data).
- I then experimented with the Darts Package, and funnily enough found that LinearRegression with a lag of -1 on the target, -30 on the precip_snow_ratio, and a chunk size of around 9 days did the trick! Note, that I did use the StandardScaler on the data to normalize such that all the series were on the same scale.
I actually did this competition for a video I made documenting the journey, check it out if you are interested: https://www.youtube.com/watch?v=Hpsy7ZVbspI
I will make a technical walkthrough of my actual solution soon(basically a walkthrough of the repo I attached).
Anyways, thanks so much everyone, I had a lot of fun and learned a lot!
All the best!
Thanks for the props 😅! Really appreciate it @MediumChungus. Amazing video content, good lighting, and nice background music.
Really appreciate it man🤘