Alvin Smart Money Management Classification Challenge
Can you classify purchases recorded on Alvin into different categories?
$3 000 USD
Ended 6 months ago
220 active · 455 enrolled
Financial Services
What Cross-validation Techniques are you using?
Help · 27 Jun 2022, 11:33 · 10

I observe that education and rental mortgage have two values each in the training dataset. I can not fit my data with stratified or K-fold cross-validation. What is your opinion and solution to this?

Discussion 10 answers

Might not be entirely right tho, try and use the train set to predict the missing target on the extra data b4 u do the main work. This might help.

27 Jun 2022, 11:45
Upvotes 2

if we use the train set to predict the missing target on the extra data, and use it to train a model, there is a big chance to train the model with wrong labels

I used StratifiedShuffleSplit and it worked, using StratifiedKFoldSplit will work if you don't set shuffle=True, since it creates a random fold each time, and education and rental mortgage might just be in one, and also trying encoding the target class that will help

27 Jun 2022, 11:55
Upvotes 1

Yes, that is my observation. i am using train_test_split.

you can use StratifiedKFoldSplit it will work with you it is better for our case.

27 Jun 2022, 16:34
Upvotes 0

There is 2 categories in the target variable that occurs twice?.

please clarify more ?

If you checked the target variable distribution, Health and Rent mortgage has 2 values in the training data.

I believe this is bcos the data is an imbalance dataset that can be solve with smote, down sampling or over