to the zindi gods here, what do you think of this?
for the 70% of public evaluation, Ghana dataset has a correct prediction of 1 which gives a score of 0.217, while setting to 0 gives 0.0 score.
test = pd.read_csv('Test.csv')
ss = pd.read_csv('SampleSubmission.csv')
test['target'] = 999
# setting ghana to 1 gives a score of 0.217391304
# setting ghana to 0 gives a score of 0.0
test.loc[test['country_id'] == 'Ghana', 'target'] = 1
ss['target'] = test['target']
ss.to_csv('sub.csv', index=False)
It's binary f1 score. Getting no prediction right i.e 0 recall, gives 0 f1 score. Right?
Can you elaborate 🤔
The way many binary classifications are handled, true positives are required to generate any recall or precision. The negative predictions actually don't count for anything.
You can see it with the training set, if you like. Most of the data set is zero, so guessing all zeros would seem to be accurate for most. But if you use the scikit-learn implementation you get 0.0 for that. See below.
But what is interesting about your experiment is that all 1's for Ghana produces a much higher score than one would expect from our training data.
from sklearn.metrics import f1_score import pandas as pd train = pd.read_csv('Train.csv') ## guess all zeros f1_score(train['target'], np.zeros(len(train['target']))) ## 0.0 ## guess all ones f1_score(train['target'], np.ones(len(train['target']))) ## 0.03598809932486554 ## Or simpler, smaller sets y_true = np.array([0, 0, 0, 0, 1]) y_pred = np.array([0, 0, 0, 0, 0]) print(f1_score(y_true, y_pred)) ## 0.0 ## flip them all around y_true = np.array([1, 1, 1, 1, 0]) y_pred = np.array([1, 1, 1, 1, 1]) print(f1_score(y_true, y_pred)) ## 0.889 ## the same "quality" of prediction delivers a very different score, depending on 0 or 1.