💳 Join the Buzz: round problem -> Lenderportion...

African Credit Scoring Challenge

Helping Africa

$5 000 USD

Completed (over 1 year ago)

Skills you will learn

1983 joined

1020 active

Info Data Chat Leaderboard

Start

Nov 29, 24

Jan 12, 25

Reveal

Jan 13, 25

private_1x

round problem -> Lender_portion_to_be_repaid

Data · 28 Dec 2024, 18:21 · 15

df['calc_Lender_portion_Funded'] = (df['Amount_Funded_By_Lender'] / df['Total_Amount'])

df['calc_Lender_portion_to_be_repaid'] = (df['calc_Lender_portion_Funded']) * df['Total_Amount_to_Repay']

calc_Lender_portion_to_be_repaid ---Lender_portion_to_be_repaid

0 120.85 --- 121.00

1 7793.7 --- 7794.00

2 1428.4 --- 1428.00

Why did you round the values in the 'Lender_portion_to_be_repaid' column?

Discussion 15 answers

CodeJoe

You can replace it with the actual values if you feel that will give a better score. I also realized that.

28 Dec 2024, 18:45

Upvotes 1

Juliuss

Freelance

Replacing with the actual values worsens matters. I wonder what am doing wrong, cannot even break past 0.69 and God knows how much I tried!

replied to CodeJoe1 Jan 2025, 14:02

Upvotes 0

CodeJoe

Hmm let me help you out.

You can use hyperparameter tuning to reach 70 but in my experience, only hyperparameter tuning can not make you pass the 70 score to even 71. Let me give you some parameters that can let you reach 70 with LightGBM:

best_params1 = {'booster': 'lightgbm',

'n_estimators': 500,

'max_depth': 8,

'learning_rate': 0.06487257646412693,

'num_leaves': 60,

'feature_fraction': 0.673436396881704,

'bagging_fraction': 0.987922773302477,

'lambda_l1': 0.21968694469084882,

'lambda_l2': 0.9887865080734871}

And these are the features:

# Combine datasets for consistent feature engineering

data = pd.concat([train, test]).reset_index(drop=True)

# Convert date columns to datetime

data['disbursement_date'] = pd.to_datetime(data['disbursement_date'], errors='coerce')

data['due_date'] = pd.to_datetime(data['due_date'], errors='coerce')

# Extract temporal features from dates

date_cols = ['disbursement_date', 'due_date']

for col in date_cols:

data[col] = pd.to_datetime(data[col])

# Extract month, day, year

data[col+'_month'] = data[col].dt.month

data[col+'_day'] = data[col].dt.day

data[col+'_year'] = data[col].dt.year

# Calculate loan term and weekday features

data[f'loan_term_days'] = (data['due_date'] - data['disbursement_date']).dt.days

data[f'disbursement_weekday'] = data['disbursement_date'].dt.weekday

data[f'due_weekday'] = data['due_date'].dt.weekday

# Create financial ratios and transformations

data['repayment_ratio'] = data['Total_Amount_to_Repay'] / data['Total_Amount']

data['log_Total_Amount'] = np.log1p(data['Total_Amount'])

# Handle categorical variables

cat_cols = data.select_dtypes(include='object').columns

# One-hot encoding for loan type

data = pd.get_dummies(data, columns=['loan_type'], prefix='loan_type', drop_first=False)

loan_type_cols = [col for col in data.columns if col.startswith('loan_type_')]

data[loan_type_cols] = data[loan_type_cols].astype(int)

# Label encoding for other categorical columns

le = LabelEncoder()

for col in [col for col in cat_cols if col not in ['loan_type', 'ID']]:

data[col] = le.fit_transform(data[col])

# Split back into train and test

train_df = data[data['ID'].isin(train['ID'].unique())]

test_df = data[data['ID'].isin(test['ID'].unique())]

# Define features for modeling

features_for_modelling = [col for col in train_df.columns if col not in date_cols + ['ID', 'target', 'country_id', 'customer_id', 'lender_id' ]]

print(f"The shape of train_df is: {train_df.shape}")

print(f"The shape of test_df is: {test_df.shape}")

print(f"The shape of train is: {train.shape}")

print(f"The shape of test is: {test.shape}")

print(f"The features for modelling are:\n{features_for_modelling}")

I hope this helps.

replied to Juliuss1 Jan 2025, 14:32

Upvotes 2

Juliuss

Freelance

This is so insightful! Let me try out this and see.

replied to CodeJoe1 Jan 2025, 14:44

Upvotes 1

CodeJoe

Sure, good luck. I am still searching on ways to improve it to reach above 71. I got stuck.

replied to Juliuss1 Jan 2025, 15:03

Upvotes 1

Juliuss

Freelance

Good luck be with you man-am sure the top guy are doing crazy POST PROCESSING-no doubt about it! You can cautiously try that out as well if you have any idease. How is your CV correlation to LB?

replied to CodeJoe1 Jan 2025, 15:06

Upvotes 1

CodeJoe

89 cv. But wait how will you do post processing here?

replied to Juliuss1 Jan 2025, 15:08

Upvotes 1

Juliuss

Freelance

Threshold tuning, ensemble many ml model outputs, some idease maybe around Ghana predictions...??

replied to CodeJoe1 Jan 2025, 15:13

Upvotes 1

CodeJoe

Oh okay. Interesting. I'll try that

replied to Juliuss1 Jan 2025, 15:16

Upvotes 1

CodeJoe

Oh sorry, I didn't see the last part. Did you say you want some ideas around the Ghanaian predictions?

replied to Juliuss1 Jan 2025, 19:03

Upvotes 0

Juliuss

Freelance

No i meant that they could be trying out some tricks on Ghana predictions. Since this has a different distribution than train data. Do you believe this can be done with some ideas?

replied to CodeJoe1 Jan 2025, 22:37

Upvotes 1