You will work with loan and customer data from Kenya (train and test set) and Ghana (test set). This split emphasises the need for models that generalise well, i.e. perform well across different countries and financial contexts.
You will notice that a customer ID can appear in multiple rows of the dataset, and for a given customer, a loan ID can also appear with different lender IDs. For example, we can have the same customer ID on three rows with the same loan ID on each but a different lender ID. This observation means that the loan is funded by three separate lenders.
Additional data pulled from the Federal Reserve Economic Data (FRED) portal are provided to enrich model performance and offer insights into features that impact on credit scoring.
The economic indicators are:
The data downloaded from FRED is saved in a separate CSV in the data section.
Join the largest network for
data scientists and AI builders