Those are good pointers, thanks! I wasn't sure about using the categorical variables so I just dropped them, but I'm now reconsidering them. They'll probably increase my model performance.
I didn't quite understand your suggestion on 'house built', 'water' and 'cropland' features. Mind explaining a little further?
I'm actually about to working on the xgboost. I'm hopeful to see how better it performs in comparison to the tuned random forest regressor.
I have a question though, how do you go about hyperparameter tuning for your models?
I've checked out the notebook and it makes sense now. Thanks for the tip.
Regarding hyperparameters, I actually mean the actual parameters. How do you get the to decide which actual parameters to use in the random grid search or grid search? For instance, the n_estimators, you can try out [200, 400, 1000]. How do you get to choose this parameters?
You did really nice EDA! I really liked it.
But you didn't do a lot in terms of feature engineering and modeling.
Try to add all the features of 'houses built' together in an additional feature. Do the same with the 'water' and 'cropland' features.
Try to use the frequency encoder with the categorical features.
And try some more complex models such as xgboost or lgbm and tune them.
This would give you a much better result.
Good luck!
Those are good pointers, thanks! I wasn't sure about using the categorical variables so I just dropped them, but I'm now reconsidering them. They'll probably increase my model performance.
I didn't quite understand your suggestion on 'house built', 'water' and 'cropland' features. Mind explaining a little further?
I'm actually about to working on the xgboost. I'm hopeful to see how better it performs in comparison to the tuned random forest regressor.
I have a question though, how do you go about hyperparameter tuning for your models?
Regarding the features, you can look into the features engineering part in my notebook:
https://github.com/mohammad2012191/Projects_Portfolio/blob/main/%5BZindi%5D%20Economic%20Well-Being%20-%20Regression/Economic-Notebook.ipynb
For the hyperparameters you can start by randomizedSearch then try to choose some promising values and try gridsearch
I've checked out the notebook and it makes sense now. Thanks for the tip.
Regarding hyperparameters, I actually mean the actual parameters. How do you get the to decide which actual parameters to use in the random grid search or grid search? For instance, the n_estimators, you can try out [200, 400, 1000]. How do you get to choose this parameters?
Question, what is the significance of combining the features?