PRACTICE Beginner Challenge
Can you predict a measure of wealth for different locations across Africa?
Prize
Knowledge
Time
Ended ~1 year ago
Participants
107 active · 553 enrolled
Helping
Africa
Good for beginners
Prediction
Financial Services
UmojaHack Practice Beginner Challenge
Notebooks · 18 Mar 2022, 12:21 · 5

Find the solution on my GitHub - https://github.com/muchemicarol/economic-well-being-prediction-competition/blob/master/UmojaHack.ipynb

Feel free to drop some suggestions on the improvements.

Discussion 5 answers

You did really nice EDA! I really liked it.

But you didn't do a lot in terms of feature engineering and modeling.

Try to add all the features of 'houses built' together in an additional feature. Do the same with the 'water' and 'cropland' features.

Try to use the frequency encoder with the categorical features.

And try some more complex models such as xgboost or lgbm and tune them.

This would give you a much better result.

Good luck!

18 Mar 2022, 14:57
Upvotes 0

Those are good pointers, thanks! I wasn't sure about using the categorical variables so I just dropped them, but I'm now reconsidering them. They'll probably increase my model performance.

I didn't quite understand your suggestion on 'house built', 'water' and 'cropland' features. Mind explaining a little further?

I'm actually about to working on the xgboost. I'm hopeful to see how better it performs in comparison to the tuned random forest regressor.

I have a question though, how do you go about hyperparameter tuning for your models?

Regarding the features, you can look into the features engineering part in my notebook:

https://github.com/mohammad2012191/Projects_Portfolio/blob/main/%5BZindi%5D%20Economic%20Well-Being%20-%20Regression/Economic-Notebook.ipynb

For the hyperparameters you can start by randomizedSearch then try to choose some promising values and try gridsearch

I've checked out the notebook and it makes sense now. Thanks for the tip.

Regarding hyperparameters, I actually mean the actual parameters. How do you get the to decide which actual parameters to use in the random grid search or grid search? For instance, the n_estimators, you can try out [200, 400, 1000]. How do you get to choose this parameters?

Question, what is the significance of combining the features?