Is anyone else getting large gap between Validation and Testing Accuracy?
Help ·9 Jul 2020, 16:35·4
I am doing validation on about 1,400,000 samples and getting 0.2 F1 Score, but when I predict the Test set and submit it, my score is very low, about 0.01. Is anyone else having a similar problem?
I extracted test 'customer_id' and 'vendor_id' features from 'CID X LOC_NUM X VENDOR' SampleSubmission column, by splitting the strings using ' X ' as delimiter. Train dataset had customer_id' and 'vendor_id' features already available. I then combined both train and test dataset that was derived from SampleSubmission file and factorized 'customer_id's by using pandas.factorize() function. I then finally split combined, factorized dataset back into train and test sets and used sklearn.train_test_split to get validation subset of training data.
Did you use any resampling techniques?or target based features ?
I extracted test 'customer_id' and 'vendor_id' features from 'CID X LOC_NUM X VENDOR' SampleSubmission column, by splitting the strings using ' X ' as delimiter. Train dataset had customer_id' and 'vendor_id' features already available. I then combined both train and test dataset that was derived from SampleSubmission file and factorized 'customer_id's by using pandas.factorize() function. I then finally split combined, factorized dataset back into train and test sets and used sklearn.train_test_split to get validation subset of training data.
I have the same problem
How do you split the Train/Validation?
I think the proper way to do it, is to split on customers (i.e. before merging with locations and before merging with vendors).
That might be a reason