https://www.kaggle.com/code/danielbruintjies/absa-client-activity-forcasting-challenge/notebook
Solution: Ensemble of 4 Multivariate Multistep LSTM with UserID Embedding Models, training time ~ 10min per model on Kaggle GPU.
Main Learnings: 1. Tensorflow is a very good framework to quickly define and pick a cool model architecture but tf is not deterministic, and results are not reproducible... -> should have converted to a Pytorch model after finding the best architecture. 2. Should have taken the time to incorporate a better validation strategy, i.e. maybe have a validation set with a similar distribution of targets to the LB test set (for this case, only 4 targets per user). Because I did not do this, I did not successfully pick a model that would've done well on private. 3. Keep track of all features created and take time to look back on them, and make sure were worth throwing away. 4. Start competitions early so more time to learn.
Congrats to all and thank you @Zindi and Absa for hosting this cool competition, it was indeed a challenge!
Nice thanks @DanielBruintjies and congrats! You really did well.
I like tf a lot, results should be reproducible? How about
but depends on tf version, I think kaggle has relatively old one.
Thanks for sharing - this is awesome.
You have relatively big model and big embeddings ... wow.
You see @wuuthraad you need GPU, D's model has 11+m weights.
This is nice model - perhaps a bit big, look at graphs, model starts to overfit soon. How was performance if you used smaller model? Also - wow!!!! - seems like you added attn layer?
Thanks! I spent a lot of time tuning the model, playing with different embedding sizes, layers (lstm/gru/cnn/dropout), and anything other than this combo decreased performance on my val set and I don't have the exact reasoning behind it (still got a lot to learn about DL in theory, like need to study what is attention exactly, why certain layers work and others not), so can't quite answer your question.
Regarding the seed, After lots of attempts and subs I found out pretty late there were already lots of discussions online about TF on gpu not being reproducible, and about version, the latest TF set_random_seed() has changed to set_seed().
Ok. Yes I also saw stuff like that (tf and GPU), so, I guess, lucky I don't have gpu :-(
I had (very very) roughly similar model as yours. Embeddings, GRU based, using tf. What you did is to predict 0s and 1s whereas I predicted actual events. I think yours is a real nice simplification. Also the way you did inputs I think is very useful and practical and powerful! Sorry, just thinking out loud - really impressed with you approach. The model I think could perhaps be simpler, but the stuff around it is at such a very high level I can't help but be impressed.
Thank you!