Hi, all!
I would like to share a notebook in which I process data in such a way that it takes up about 3-4 times less space without losing information.
In the first part, more compact formats are selected for the columns and the results are saved in .pkl, which allows you to compress each of the train/test_frames{i} files by 3-4 times.
In the second part, fragments of train/test_frames{i} files are concatenated into one. The final train_frames.pkl takes up 10 Gb of Ram, which gives more opportunities for experimenting with limited hardware resources.
You can download notebook from: https://disk.yandex.ru/d/rnUFxjzICF3mGw
I hope it will be helpfull. Good luck!
UPD: @mdkoz found a bug that the loop for joining the test pickles goes up to 4 and skips over the 5th item
Very helpful. Thank you.
There is a bug in this script. When joining the test pickles your loop only goes up to 4 and skips over the 5th item.
You are right, thank you!
No problem, thanks for the script
your welcome to contnue the conversaion here https://discord.gg/TwsnzK8k
Hi, getting memory error while reading pickle files, what's the fix?