Primary competition visual

Absa Corporate Client Activity Forecasting Challenge

Helping South Africa
$5 000 USD
Completed (~3 years ago)
Forecast
150 joined
46 active
Starti
Nov 01, 22
Closei
Nov 27, 22
Reveali
Nov 27, 22
User avatar
21db
Public LB 1st Place Notebook
Notebooks Ā· 30 Nov 2022, 11:04 Ā· 5

https://www.kaggle.com/code/danielbruintjies/absa-client-activity-forcasting-challenge/notebook

Solution: Ensemble of 4 Multivariate Multistep LSTM with UserID Embedding Models, training time ~ 10min per model on Kaggle GPU.

Main Learnings: 1. Tensorflow is a very good framework to quickly define and pick a cool model architecture but tf is not deterministic, and results are not reproducible... -> should have converted to a Pytorch model after finding the best architecture. 2. Should have taken the time to incorporate a better validation strategy, i.e. maybe have a validation set with a similar distribution of targets to the LB test set (for this case, only 4 targets per user). Because I did not do this, I did not successfully pick a model that would've done well on private. 3. Keep track of all features created and take time to look back on them, and make sure were worth throwing away. 4. Start competitions early so more time to learn.

Congrats to all and thank you @Zindi and Absa for hosting this cool competition, it was indeed a challenge!

Discussion 5 answers
User avatar
skaak
Ferra Solutions

Nice thanks @DanielBruintjies and congrats! You really did well.

I like tf a lot, results should be reproducible? How about

tf.keras.utils.set_random_seed ( 123 )

but depends on tf version, I think kaggle has relatively old one.

Thanks for sharing - this is awesome.

You have relatively big model and big embeddings ... wow.

You see @wuuthraad you need GPU, D's model has 11+m weights.

30 Nov 2022, 11:28
Upvotes 0
User avatar
skaak
Ferra Solutions

This is nice model - perhaps a bit big, look at graphs, model starts to overfit soon. How was performance if you used smaller model? Also - wow!!!! - seems like you added attn layer?

30 Nov 2022, 11:36
Upvotes 0
User avatar
21db

Thanks! I spent a lot of time tuning the model, playing with different embedding sizes, layers (lstm/gru/cnn/dropout), and anything other than this combo decreased performance on my val set and I don't have the exact reasoning behind it (still got a lot to learn about DL in theory, like need to study what is attention exactly, why certain layers work and others not), so can't quite answer your question.

Regarding the seed, After lots of attempts and subs I found out pretty late there were already lots of discussions online about TF on gpu not being reproducible, and about version, the latest TF set_random_seed() has changed to set_seed().

User avatar
skaak
Ferra Solutions

Ok. Yes I also saw stuff like that (tf and GPU), so, I guess, lucky I don't have gpu :-(

I had (very very) roughly similar model as yours. Embeddings, GRU based, using tf. What you did is to predict 0s and 1s whereas I predicted actual events. I think yours is a real nice simplification. Also the way you did inputs I think is very useful and practical and powerful! Sorry, just thinking out loud - really impressed with you approach. The model I think could perhaps be simpler, but the stuff around it is at such a very high level I can't help but be impressed.

User avatar
21db

Thank you!