Primary competition visual

Hulkshare Recommendation Algorithm Challenge

$7 500 USD
Challenge completed over 3 years ago
Prediction
Collaborative Filtering
510 joined
70 active
Starti
Feb 03, 22
Closei
May 01, 22
Reveali
May 01, 22
Data compression
Notebooks · 12 Mar 2022, 18:38 · edited 14 days later · 6

Hi, all!

I would like to share a notebook in which I process data in such a way that it takes up about 3-4 times less space without losing information.

In the first part, more compact formats are selected for the columns and the results are saved in .pkl, which allows you to compress each of the train/test_frames{i} files by 3-4 times.

In the second part, fragments of train/test_frames{i} files are concatenated into one. The final train_frames.pkl takes up 10 Gb of Ram, which gives more opportunities for experimenting with limited hardware resources.

You can download notebook from: https://disk.yandex.ru/d/rnUFxjzICF3mGw

I hope it will be helpfull. Good luck!

UPD: @mdkoz found a bug that the loop for joining the test pickles goes up to 4 and skips over the 5th item

Discussion 6 answers
User avatar
flamethrower

Very helpful. Thank you.

12 Mar 2022, 20:47
Upvotes 0

There is a bug in this script. When joining the test pickles your loop only goes up to 4 and skips over the 5th item.

26 Mar 2022, 14:54
Upvotes 0

You are right, thank you!

No problem, thanks for the script

your welcome to contnue the conversaion here https://discord.gg/TwsnzK8k

12 Apr 2022, 08:24
Upvotes 0

Hi, getting memory error while reading pickle files, what's the fix?

16 Apr 2022, 03:06
Upvotes 0