I won the competition Hackathon in November and remained top of the leaderboard for months using only Non-negative Matrix Factorization and only a single data-source, check out how I did this and how I organize and plan my Data Science projects to go from zero-to-hero: https://github.com/marcusinthesky/Zindi-Uber-RANRAIL-Movement-Challenge.
Non-negative Matrix Factorization (NNMF) is a simple and classic method used in Recommender Systems. In recommender systems, you typically have a matrix with each row representing a user and each column a movie, with ratings in each cell. Using NNMF you try and find a subset of the two matrixes that decompose the original movie review matrix which best reconstructs this matrix. This can be difficult to cross-validate as you have to randomly delete entries across the matrix and test on how well you recover them but can be incredibly robust to noise and overfitting. This is not a method many Data Scientists are exposed to but is a classical approach every Data Scientist should have in their back pocket. This method is about as far from XGboost/Catboost/LightGBM as possible, proving there are many ways to tackle a problem is we think creatively enough.
I post weekly Data Science notebooks and articles on my LinkedIn, please check it out: https://www.linkedin.com/in/marcussky/.
Sweet, thank you for sharing! I dig the way you set up your project.
Thank you for the feedback. I have gotten really into a tool called Kedro by QuantumBlack (check it out on Github). It tries to do a lot which can be a blessing and curse, but helps me maintain good habits and handles the things I hate for me like setting up automatic documentation, building wheels and logging. It lets you go zeros to hero really quickly and their data versioning tool is just amazing.