What Cross-validation Techniques are you using?
Help · 27 Jun 2022, 11:33 · 10

I observe that education and rental mortgage have two values each in the training dataset. I can not fit my data with stratified or K-fold cross-validation. What is your opinion and solution to this?

Might not be entirely right tho, try and use the train set to predict the missing target on the extra data b4 u do the main work. This might help.

27 Jun 2022, 11:45
if we use the train set to predict the missing target on the extra data, and use it to train a model, there is a big chance to train the model with wrong labels

I used StratifiedShuffleSplit and it worked, using StratifiedKFoldSplit will work if you don't set shuffle=True, since it creates a random fold each time, and education and rental mortgage might just be in one, and also trying encoding the target class that will help

27 Jun 2022, 11:55
Yes, that is my observation. i am using train_test_split.

you can use StratifiedKFoldSplit it will work with you it is better for our case.

27 Jun 2022, 16:34
There is 2 categories in the target variable that occurs twice?.

please clarify more ?

If you checked the target variable distribution, Health and Rent mortgage has 2 values in the training data.

I believe this is bcos the data is an imbalance dataset that can be solve with smote, down sampling or over