Primary competition visual

Akeed Restaurant Recommendation Challenge

Helping Oman
$3 000 USD
Completed (over 5 years ago)
Prediction
Collaborative Filtering
1420 joined
242 active
Starti
May 18, 20
Closei
Aug 16, 20
Reveali
Aug 16, 20
test_set
Help · 23 Jul 2020, 18:39 · 2

When we Create the id column using the three columns mentionned in the named of simple submission id and after merging this with this:

test = pd.merge(test, submission, how='outer', on=['CID X LOC_NUM X VENDOR']

we will get the same length of row for test and submission file but on the test set we will get 40300 missing values for others columns.

How to handle them ?

Discussion 2 answers

Absolutely the same issue here. 40300 rows are missing from the provided test data. Hope someone is gonna answer soon.

6 Aug 2020, 02:16
Upvotes 0

What I do:

break the test dataset in two parts: with nulls and without nulls

testOK   = test[~test['customer_id'].isnull()]

testNull = test[test['customer_id'].isnull()]

print (test.shape)

print (testOK.shape)

print (testNull.shape)

After that I got the customer_id, loc_num and vendor for the CID X LOC_NUM X VENDOR column: (only for testNull)

# fill null columns in test after Merge - for customer_id, location_number_obj and id_obj

for i, row in testNull.iterrows():

if pd.isna(row['customer_id']):

text = row['CID X LOC_NUM X VENDOR']

x1 = text.find(' X ')

x2 = text.find(' X ', x1+2)

text1 = text[0:x1]

text2 = text[x1+2: x2]

text3 = text[x2+2:]

test.at[i,'customer_id']         = text1

test.at[i,'location_number_obj'] = text2

test.at[i,'id_obj']              = text3

Them recreate the test dataset:

frames  = [testOK, testNull]

test    = pd.concat(frames)

After that I apply my null column strategy for that customer, Loc_num or vendor

7 Aug 2020, 14:02
Upvotes 0