Primary competition visual

Alvin Smart Money Management Classification Challenge

Helping Kenya
$3 000 USD
Challenge completed ~3 years ago
Classification
497 joined
220 active
Starti
Jun 22, 22
Closei
Jul 24, 22
Reveali
Jul 24, 22
User avatar
Raheem_Nasirudeen
The polytechnic ibadan
What Cross-validation Techniques are you using?
Help · 27 Jun 2022, 11:33 · 11

I observe that education and rental mortgage have two values each in the training dataset. I can not fit my data with stratified or K-fold cross-validation. What is your opinion and solution to this?

Discussion 11 answers
User avatar
Emmanuel360__RAIN
Robotics and artificial intelligence nigeria

Might not be entirely right tho, try and use the train set to predict the missing target on the extra data b4 u do the main work. This might help.

27 Jun 2022, 11:45
Upvotes 1
User avatar
Raheem_Nasirudeen
The polytechnic ibadan

Thanks, I will look into that.

User avatar
TAUIL_Abdelilah
university abdelmalek essaadi

if we use the train set to predict the missing target on the extra data, and use it to train a model, there is a big chance to train the model with wrong labels

I used StratifiedShuffleSplit and it worked, using StratifiedKFoldSplit will work if you don't set shuffle=True, since it creates a random fold each time, and education and rental mortgage might just be in one, and also trying encoding the target class that will help

27 Jun 2022, 11:55
Upvotes 1
User avatar
Raheem_Nasirudeen
The polytechnic ibadan

Yes, that is my observation. i am using train_test_split.

you can use StratifiedKFoldSplit it will work with you it is better for our case.

27 Jun 2022, 16:34
Upvotes 0
User avatar
Raheem_Nasirudeen
The polytechnic ibadan

There is 2 categories in the target variable that occurs twice?.

please clarify more ?

User avatar
Raheem_Nasirudeen
The polytechnic ibadan

If you checked the target variable distribution, Health and Rent mortgage has 2 values in the training data.

User avatar
Emmanuel360__RAIN
Robotics and artificial intelligence nigeria

I believe this is bcos the data is an imbalance dataset that can be solve with smote, down sampling or over

User avatar
ML_Wizzard
Nasarawa State University

I used stratifiedshufflesplit and it worked, try using train_test_split

1 Apr 2023, 00:29
Upvotes 0