User avatar
GideonG
Zindi Ambassador to Nigeria
Please help! Converting Dictionary to Dataframe: ( Error=> AttributeError: 'dict' object has no attribute 'to_csv' )
Data · 26 Sep 2020, 16:53 · 15

I have my dictionary as so

results = {'Applicant_ID': test['Applicant_ID'], 'default_status': predictions}

Then I wanted to convert it to CSV so as to submit, to i did this

results.to_csv('first_submission.csv', index=False, sep=',', encoding='utf-8')

But this is the error i got

-------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-59-ac95985a79f4> in <module> ----> 1 results.to_csv('first_submission.csv', index=False, sep=',', encoding='utf-8') AttributeError: 'dict' object has no attribute 'to_csv'

Please help me solve this last step. I appreciate

Discussion 15 answers
User avatar
University of lagos

do this instead:

sub = pd.DataFrame(results)

sub.to_csv('first_submission.csv', index=False)

26 Sep 2020, 16:56
Upvotes 0
User avatar
Ladoke akintola university of technology

Try this

Result = pd.DataFrame({'Applicant_ID' : test_id, 'default_status' : prediction})

Result.to_csv('s.csv',index= False)

Basically you need to create a dataframe from the dictionary before you create a csv file for submission. The suggested ways are OK.

26 Sep 2020, 17:08
Upvotes 0
User avatar
GideonG
Zindi Ambassador to Nigeria

Guys thanks for your efforts, please I tried all suggestions it's showing me

TypeError: 'int' object is not iterable

User avatar
University of lagos

Alrights what's the datatype of your prediction, check and confirm using type(prediction) and ensure it's not int,

just make sure you have the right thing in your prediction variable

User avatar
Ladoke akintola university of technology

I think the problem is with your prediction, did you use predict_proba( )

User avatar
GideonG
Zindi Ambassador to Nigeria

it is int ooo

# Checking unique values

predictions = pd.DataFrame(predictions)

predictions[0].value_counts()

BELOW IS THE RESULTS

no     33
yes     6
Name: 0, dtype: int64

In [41]:

User avatar
GideonG
Zindi Ambassador to Nigeria

I guess it should be the prediction but i didnt use predict_proba()

User avatar
University of lagos

either predict or predict_proba works well, the only difference will be leaderboard score

User avatar
University of lagos

predictions[0].value_counts(), does this mean that each prediction for your row is a series?

How did you make predictions with your model,?

User avatar
GideonG
Zindi Ambassador to Nigeria

i am not sure what i did there its my first time let me delete that and cross check

User avatar
University of lagos

Alright what you should do while getting probabilities for the positive class, you should use, say , model = SVC(),

model.fit(train, target)

predicitions = model.predict_proba(test)[:, 1],

the : means get all the rows,

the 1 means get only for the second index, which is the positive class.

I don't know how you made your predictions, but it's obvious your predictions have both classes stored in a series.

User avatar
GideonG
Zindi Ambassador to Nigeria

when i use

predictions = model.predict(X_test)

i get predictions as

array(['no', 'yes', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'yes', 'yes', 'no', 'no', 'no', 'no', 'no', 'yes', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'yes', 'no', 'no', 'yes', 'no', 'no', 'no', 'no', 'no', 'no'], dtype=object)

when i use predictions = model.predict_proba(X_test)[:, 1]

i get this predictions below

array([7.5802974e-02, 7.8776097e-01, 1.0669037e-01, 3.1880852e-02, 5.3155970e-02, 3.9877513e-04, 5.7491781e-03, 9.4394637e-03, 8.2479358e-02, 9.8326337e-01, 8.9332980e-01, 2.3656862e-03, 3.7998256e-01, 6.3516814e-03, 7.4714171e-03, 2.7427453e-01, 8.2066584e-01, 2.8061919e-02, 3.1294174e-02, 8.8469414e-03, 3.4059985e-03, 8.1312051e-03, 4.2162776e-02, 2.8273210e-02, 7.3419884e-02, 1.8324036e-02, 8.5808383e-04, 9.6634634e-02, 7.6151644e-03, 5.5454016e-01, 1.1143746e-02, 2.1125585e-01, 8.0490875e-01, 9.5853936e-03, 3.8374925e-01, 8.1553147e-04, 1.3214785e-03, 8.8919103e-03, 2.5249668e-03], dtype=float32)

User avatar
University of lagos

first of all your test.shape seeems weird, seems like 30 something it should be thousands, also ensure to have encoded your target feature and if you did'nt no p i guess, just use the model.predict_proba(X_test)

oh I see the problem with why the shape is small now, you're predicting on your validation set which is good but what you want to submit is prediction on your test set, test = pd.read_csv("Test.csv") and ensure any new features you created/deleted for train, you did the same for test. in essnence your train.shape[1] must be equal to test.shape[1].

and also a validation anount of 30 something for this dataset is low, if i'm actually correct that X_test is your validation dataset

User avatar
GideonG
Zindi Ambassador to Nigeria

Thank you so much, I tried all you suggested... I test.shape is showing (24000, 51)

train.shape[1] and test.shape[1] are both 51

I also encoded the target variable

but my predictions on model.predict_proba(X_test) is still showing the shape of 39

I guess that's why I still get this error

ValueError: array length 39 does not match index length 24000 immediately after

submission=pd.DataFrame({ 'Applicant_ID': test['Applicant_ID'],'default_status': predictions})