Well I think the LB score may be misleading, the data distribution is extremly different, and the public/private split could be chosen on purpose to give a huge contrast between lb and cv, so I think it is better to focus on developing a solid CV scheme and not giving much attention to LB , this is the way I am encountering this competetion, but over all of that mere luck can play significant role in such competitions, so good luck!
>You will work with loan and customer data from Kenya (train and test set) and Ghana (test set). This split emphasises the need for models that generalise well, i.e. perform well across different countries and financial contexts.
I think the CV may not be reliable due to the presence of data from other countries in the test set. As you said, we also don't know the data division of the ranking list, so LB is also unreliable. Trust your luck.
also in this type of data simple aggregations and interactions tend to be the most usefull features, along with GBT models it can be so powerfull, so make sure to try all possible choices.
feature engineering
data['interest_rate'] = (data['Total_Amount_to_Repay'] - data['Total_Amount']) / data['Total_Amount'] * 100
this is best feature
Will try this.
Well I think the LB score may be misleading, the data distribution is extremly different, and the public/private split could be chosen on purpose to give a huge contrast between lb and cv, so I think it is better to focus on developing a solid CV scheme and not giving much attention to LB , this is the way I am encountering this competetion, but over all of that mere luck can play significant role in such competitions, so good luck!
>You will work with loan and customer data from Kenya (train and test set) and Ghana (test set). This split emphasises the need for models that generalise well, i.e. perform well across different countries and financial contexts.
I think the CV may not be reliable due to the presence of data from other countries in the test set. As you said, we also don't know the data division of the ranking list, so LB is also unreliable. Trust your luck.
also in this type of data simple aggregations and interactions tend to be the most usefull features, along with GBT models it can be so powerfull, so make sure to try all possible choices.