Primary competition visual

Hulkshare Recommendation Algorithm Challenge

$7 500 USD
Challenge completed over 3 years ago
Prediction
Collaborative Filtering
510 joined
70 active
Starti
Feb 03, 22
Closei
May 01, 22
Reveali
May 01, 22
User avatar
flamethrower
Ideas for Quick Processing for Feature Engineering
Data · 17 Mar 2022, 09:54 · edited 2 minutes later · 2

Hello,

There is a major hassle with utilizing proper feature engineering generation in this challenge, with the large dataset there's a time complexity. Can we share how we are going about optimizing the process on this discussion, I will update this discussion with any ideas as I explore ways to optimize the process as well. Let's learn from each other about parallelism and efficiency.

Thank you.

Discussion 2 answers
User avatar
flamethrower

Update-

Here are some helpful resources for accelerated workflows and data processing:

Memory Usage Reduction- https://www.kaggle.com/code/gemartin/load-data-reduce-memory-usage/notebook

RAPIDS AI CUDF - Enables accelerated workflows for tabular data on CUDA https://docs.rapids.ai/api/cudf/stable/

RAPIDS AI CUML- Accelerated model training, GPU integration with traditional machine learning algorithms: https://www.analyticsvidhya.com/blog/2022/01/cuml-blazing-fast-machine-learning-model-training-with-nvidias-rapids/, https://medium.com/rapids-ai/10-minutes-to-rapids-cudf-and-dask-cudf-3d16fcb84139

PANDAS + DASK - https://pandas.pydata.org/docs/user_guide/scale.html, https://www.vantage-ai.com/en/blog/4-strategies-how-to-deal-with-large-datasets-in-pandas, https://docs.dask.org/en/stable/10-minutes-to-dask.html

P.S- I'm yet to implement them and see the benefits.

your welcome to contunue the conversation on our discourd https://discord.gg/TwsnzK8k

12 Apr 2022, 08:19
Upvotes 0