I have my dictionary as so
results = {'Applicant_ID': test['Applicant_ID'], 'default_status': predictions}
Then I wanted to convert it to CSV so as to submit, to i did this
results.to_csv('first_submission.csv', index=False, sep=',', encoding='utf-8')
But this is the error i got
-------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-59-ac95985a79f4> in <module> ----> 1 results.to_csv('first_submission.csv', index=False, sep=',', encoding='utf-8') AttributeError: 'dict' object has no attribute 'to_csv'
Please help me solve this last step. I appreciate
do this instead:
sub = pd.DataFrame(results)
sub.to_csv('first_submission.csv', index=False)
Try this
Result = pd.DataFrame({'Applicant_ID' : test_id, 'default_status' : prediction})
Result.to_csv('s.csv',index= False)
Basically you need to create a dataframe from the dictionary before you create a csv file for submission. The suggested ways are OK.
Guys thanks for your efforts, please I tried all suggestions it's showing me
TypeError: 'int' object is not iterable
Alrights what's the datatype of your prediction, check and confirm using type(prediction) and ensure it's not int,
just make sure you have the right thing in your prediction variable
I think the problem is with your prediction, did you use predict_proba( )
it is int ooo
# Checking unique values
predictions = pd.DataFrame(predictions)
predictions[0].value_counts()
BELOW IS THE RESULTS
In [41]:
I guess it should be the prediction but i didnt use predict_proba()
either predict or predict_proba works well, the only difference will be leaderboard score
predictions[0].value_counts(), does this mean that each prediction for your row is a series?
How did you make predictions with your model,?
i am not sure what i did there its my first time let me delete that and cross check
Alright what you should do while getting probabilities for the positive class, you should use, say , model = SVC(),
model.fit(train, target)
predicitions = model.predict_proba(test)[:, 1],
the : means get all the rows,
the 1 means get only for the second index, which is the positive class.
I don't know how you made your predictions, but it's obvious your predictions have both classes stored in a series.
when i use
predictions = model.predict(X_test)
i get predictions as
array(['no', 'yes', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'yes', 'yes', 'no', 'no', 'no', 'no', 'no', 'yes', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'yes', 'no', 'no', 'yes', 'no', 'no', 'no', 'no', 'no', 'no'], dtype=object)
when i use predictions = model.predict_proba(X_test)[:, 1]
i get this predictions below
array([7.5802974e-02, 7.8776097e-01, 1.0669037e-01, 3.1880852e-02, 5.3155970e-02, 3.9877513e-04, 5.7491781e-03, 9.4394637e-03, 8.2479358e-02, 9.8326337e-01, 8.9332980e-01, 2.3656862e-03, 3.7998256e-01, 6.3516814e-03, 7.4714171e-03, 2.7427453e-01, 8.2066584e-01, 2.8061919e-02, 3.1294174e-02, 8.8469414e-03, 3.4059985e-03, 8.1312051e-03, 4.2162776e-02, 2.8273210e-02, 7.3419884e-02, 1.8324036e-02, 8.5808383e-04, 9.6634634e-02, 7.6151644e-03, 5.5454016e-01, 1.1143746e-02, 2.1125585e-01, 8.0490875e-01, 9.5853936e-03, 3.8374925e-01, 8.1553147e-04, 1.3214785e-03, 8.8919103e-03, 2.5249668e-03], dtype=float32)
first of all your test.shape seeems weird, seems like 30 something it should be thousands, also ensure to have encoded your target feature and if you did'nt no p i guess, just use the model.predict_proba(X_test)
oh I see the problem with why the shape is small now, you're predicting on your validation set which is good but what you want to submit is prediction on your test set, test = pd.read_csv("Test.csv") and ensure any new features you created/deleted for train, you did the same for test. in essnence your train.shape[1] must be equal to test.shape[1].
and also a validation anount of 30 something for this dataset is low, if i'm actually correct that X_test is your validation dataset
Thank you so much, I tried all you suggested... I test.shape is showing (24000, 51)
train.shape[1] and test.shape[1] are both 51
I also encoded the target variable
but my predictions on model.predict_proba(X_test) is still showing the shape of 39
I guess that's why I still get this error
ValueError: array length 39 does not match index length 24000 immediately after
submission=pd.DataFrame({ 'Applicant_ID': test['Applicant_ID'],'default_status': predictions})