🤝 Let's Talk About: Is anyone else getting large g...

Akeed Restaurant Recommendation Challenge

Helping Oman

$3 000 USD

Completed (almost 6 years ago)

Skills you will learn

Prediction

Collaborative Filtering

1423 joined

242 active

Info Data Chat Leaderboard

Start

May 18, 20

Aug 16, 20

Reveal

Aug 16, 20

IMAGINE_3D_AI_SOLUTIONS

Is anyone else getting large gap between Validation and Testing Accuracy?

Help · 9 Jul 2020, 16:35 · 4

I am doing validation on about 1,400,000 samples and getting 0.2 F1 Score, but when I predict the Test set and submit it, my score is very low, about 0.01. Is anyone else having a similar problem?

Discussion 4 answers

AnilBetta

Did you use any resampling techniques?or target based features ?

9 Jul 2020, 18:16 (edited 1 minute later)

Upvotes 0

mutalchik

I extracted test 'customer_id' and 'vendor_id' features from 'CID X LOC_NUM X VENDOR' SampleSubmission column, by splitting the strings using ' X ' as delimiter. Train dataset had customer_id' and 'vendor_id' features already available. I then combined both train and test dataset that was derived from SampleSubmission file and factorized 'customer_id's by using pandas.factorize() function. I then finally split combined, factorized dataset back into train and test sets and used sklearn.train_test_split to get validation subset of training data.

replied to AnilBetta9 Jul 2020, 19:05

Upvotes 0

mutalchik

I have the same problem

replied to AnilBetta9 Jul 2020, 19:06

Upvotes 0

adnene_tk

How do you split the Train/Validation?

I think the proper way to do it, is to split on customers (i.e. before merging with locations and before merging with vendors).

That might be a reason

10 Jul 2020, 07:51

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status