Am Willing to learn more so please share your ideas if you don't mind : (keep in mind am still a novice)
1. mapping LGA_Name and State values to correct values using provided "NigerianStateNames.csv"
2. removing duplicates from train data using subset('Policy Start Date', 'Policy End Date', 'ProductName', 'Age', 'No_Pol','Gender', 'Car_Category', 'Subject_Car_Colour', 'Subject_Car_Make', 'LGA_Name', 'State')
3. other methods like filling nan values etc
NB: more could be done here
1. binary variables (1, 0) 1 imputed value else 0
2. binary variables (1, 0) 1 rare value else 0
3. date features (year, month, week, day) and policy duration (year, month, week, day)
4. interaction features (level 2 combinations of (categorical and original numerical features(No_Pol, Age)))
4. original numerical interactions features for (No_Pol and Age) eg. addition, mulitplication and subtraction
5. Weight of Evidence Encoding for categorical features. (Target Encoding worked well too in my case)
6. sum, min , max, kurtosis, min, std, skew, median etc row wise
7. transforming the data using power transform
1. weighted voting classifier (LGBMClassifier and CatBoostClassifier, each with class weights)
2. Probability threshold moving
Can you please share your code with me...or do you have a github account you can share there ??
similar solution:
NOTE: (unitdy not well documented notebook to show how my solution kinda looked like)
https://nbviewer.jupyter.org/github/MusahO/AutoInland-Vehicle-Insurance-Claim-Challenge/blob/main/AutoInland-Vehicle-Insurance-Claim-Challenge.ipynb
if you have any ideas of how to improve my code i gadly welcome your two cents on it
Alright