🤝 Data Talk: test_set

Akeed Restaurant Recommendation Challenge

Helping Oman

$3 000 USD

Completed (almost 6 years ago)

Skills you will learn

Prediction

Collaborative Filtering

1423 joined

242 active

Info Data Chat Leaderboard

Start

May 18, 20

Aug 16, 20

Reveal

Aug 16, 20

misericorde91

test_set

Help · 23 Jul 2020, 18:39 · 2

When we Create the id column using the three columns mentionned in the named of simple submission id and after merging this with this:

test = pd.merge(test, submission, how='outer', on=['CID X LOC_NUM X VENDOR']

we will get the same length of row for test and submission file but on the test set we will get 40300 missing values for others columns.

How to handle them ?

Discussion 2 answers

moctarjallo

Absolutely the same issue here. 40300 rows are missing from the provided test data. Hope someone is gonna answer soon.

6 Aug 2020, 02:16

Upvotes 0

rdeggau

What I do:

break the test dataset in two parts: with nulls and without nulls

testOK = test[~test['customer_id'].isnull()]

testNull = test[test['customer_id'].isnull()]

print (test.shape)

print (testOK.shape)

print (testNull.shape)

After that I got the customer_id, loc_num and vendor for the CID X LOC_NUM X VENDOR column: (only for testNull)

# fill null columns in test after Merge - for customer_id, location_number_obj and id_obj

for i, row in testNull.iterrows():

if pd.isna(row['customer_id']):

text = row['CID X LOC_NUM X VENDOR']

x1 = text.find(' X ')

x2 = text.find(' X ', x1+2)

text1 = text[0:x1]

text2 = text[x1+2: x2]

text3 = text[x2+2:]

test.at[i,'customer_id'] = text1

test.at[i,'location_number_obj'] = text2

test.at[i,'id_obj'] = text3

Them recreate the test dataset:

frames = [testOK, testNull]

test = pd.concat(frames)

After that I apply my null column strategy for that customer, Loc_num or vendor

7 Aug 2020, 14:02

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status