Primary competition visual

African Credit Scoring Challenge

Helping Africa
$5 000 USD
Completed (over 1 year ago)
1983 joined
1020 active
Starti
Nov 29, 24
Closei
Jan 12, 25
Reveali
Jan 13, 25
round problem -> Lender_portion_to_be_repaid
Data · 28 Dec 2024, 18:21 · 15

df['calc_Lender_portion_Funded'] = (df['Amount_Funded_By_Lender'] / df['Total_Amount'])

df['calc_Lender_portion_to_be_repaid'] = (df['calc_Lender_portion_Funded']) * df['Total_Amount_to_Repay']

calc_Lender_portion_to_be_repaid ---Lender_portion_to_be_repaid

0 120.85 --- 121.00

1 7793.7 --- 7794.00

2 1428.4 --- 1428.00

Why did you round the values in the 'Lender_portion_to_be_repaid' column?

Discussion 15 answers
User avatar
CodeJoe

You can replace it with the actual values if you feel that will give a better score. I also realized that.

28 Dec 2024, 18:45
Upvotes 1
User avatar
Juliuss
Freelance

Replacing with the actual values worsens matters. I wonder what am doing wrong, cannot even break past 0.69 and God knows how much I tried!

User avatar
CodeJoe

Hmm let me help you out.

You can use hyperparameter tuning to reach 70 but in my experience, only hyperparameter tuning can not make you pass the 70 score to even 71. Let me give you some parameters that can let you reach 70 with LightGBM:

best_params1 = {'booster': 'lightgbm',

'n_estimators': 500,

'max_depth': 8,

'learning_rate': 0.06487257646412693,

'num_leaves': 60,

'feature_fraction': 0.673436396881704,

'bagging_fraction': 0.987922773302477,

'lambda_l1': 0.21968694469084882,

'lambda_l2': 0.9887865080734871}

And these are the features:

# Combine datasets for consistent feature engineering

data = pd.concat([train, test]).reset_index(drop=True)

# Convert date columns to datetime

data['disbursement_date'] = pd.to_datetime(data['disbursement_date'], errors='coerce')

data['due_date'] = pd.to_datetime(data['due_date'], errors='coerce')

# Extract temporal features from dates

date_cols = ['disbursement_date', 'due_date']

for col in date_cols:

data[col] = pd.to_datetime(data[col])

# Extract month, day, year

data[col+'_month'] = data[col].dt.month

data[col+'_day'] = data[col].dt.day

data[col+'_year'] = data[col].dt.year

# Calculate loan term and weekday features

data[f'loan_term_days'] = (data['due_date'] - data['disbursement_date']).dt.days

data[f'disbursement_weekday'] = data['disbursement_date'].dt.weekday

data[f'due_weekday'] = data['due_date'].dt.weekday

# Create financial ratios and transformations

data['repayment_ratio'] = data['Total_Amount_to_Repay'] / data['Total_Amount']

data['log_Total_Amount'] = np.log1p(data['Total_Amount'])

# Handle categorical variables

cat_cols = data.select_dtypes(include='object').columns

# One-hot encoding for loan type

data = pd.get_dummies(data, columns=['loan_type'], prefix='loan_type', drop_first=False)

loan_type_cols = [col for col in data.columns if col.startswith('loan_type_')]

data[loan_type_cols] = data[loan_type_cols].astype(int)

# Label encoding for other categorical columns

le = LabelEncoder()

for col in [col for col in cat_cols if col not in ['loan_type', 'ID']]:

data[col] = le.fit_transform(data[col])

# Split back into train and test

train_df = data[data['ID'].isin(train['ID'].unique())]

test_df = data[data['ID'].isin(test['ID'].unique())]

# Define features for modeling

features_for_modelling = [col for col in train_df.columns if col not in date_cols + ['ID', 'target', 'country_id', 'customer_id', 'lender_id' ]]

print(f"The shape of train_df is: {train_df.shape}")

print(f"The shape of test_df is: {test_df.shape}")

print(f"The shape of train is: {train.shape}")

print(f"The shape of test is: {test.shape}")

print(f"The features for modelling are:\n{features_for_modelling}")

I hope this helps.

User avatar
Juliuss
Freelance

This is so insightful! Let me try out this and see.

User avatar
CodeJoe

Sure, good luck. I am still searching on ways to improve it to reach above 71. I got stuck.

User avatar
Juliuss
Freelance

Good luck be with you man-am sure the top guy are doing crazy POST PROCESSING-no doubt about it! You can cautiously try that out as well if you have any idease. How is your CV correlation to LB?

User avatar
CodeJoe

89 cv. But wait how will you do post processing here?

User avatar
Juliuss
Freelance

Threshold tuning, ensemble many ml model outputs, some idease maybe around Ghana predictions...??

User avatar
CodeJoe

Oh okay. Interesting. I'll try that

User avatar
CodeJoe

Oh sorry, I didn't see the last part. Did you say you want some ideas around the Ghanaian predictions?

User avatar
Juliuss
Freelance

No i meant that they could be trying out some tricks on Ghana predictions. Since this has a different distribution than train data. Do you believe this can be done with some ideas?

User avatar
CodeJoe

Yeah you have a point. I will try that.

User avatar
Luis_okech
Multimedia university of kenya

how do you decide when to use such hyperparams, gridsearch??:'learning_rate': 0.06487257646412693,

'num_leaves': 60,

'feature_fraction': 0.673436396881704,

'bagging_fraction': 0.987922773302477,

'lambda_l1': 0.21968694469084882,

'lambda_l2': 0.9887865080734871}

User avatar
CodeJoe

To be frank, you will have to play with it for a while. The lambda l1 and l2 are more into regularization to prevent overfitting.

User avatar
Luis_okech
Multimedia university of kenya

performs better than xgb