Primary competition visual

African Credit Scoring Challenge

Helping Africa
$5 000 USD
Completed (~1 year ago)
1959 joined
1022 active
Starti
Nov 29, 24
Closei
Jan 12, 25
Reveali
Jan 13, 25
User avatar
DanielTobi0
Ibadan city polyechnic
something werid (probing)
Help · 3 Jan 2025, 21:44 · 3

to the zindi gods here, what do you think of this?

for the 70% of public evaluation, Ghana dataset has a correct prediction of 1 which gives a score of 0.217, while setting to 0 gives 0.0 score.

test = pd.read_csv('Test.csv')

ss = pd.read_csv('SampleSubmission.csv')

test['target'] = 999

# setting ghana to 1 gives a score of 0.217391304

# setting ghana to 0 gives a score of 0.0

test.loc[test['country_id'] == 'Ghana', 'target'] = 1

ss['target'] = test['target']

ss.to_csv('sub.csv', index=False)

Discussion 3 answers

It's binary f1 score. Getting no prediction right i.e 0 recall, gives 0 f1 score. Right?

3 Jan 2025, 21:53
Upvotes 2
User avatar
DanielTobi0
Ibadan city polyechnic

Can you elaborate 🤔

User avatar
mlandry
H2O.ai

The way many binary classifications are handled, true positives are required to generate any recall or precision. The negative predictions actually don't count for anything.

You can see it with the training set, if you like. Most of the data set is zero, so guessing all zeros would seem to be accurate for most. But if you use the scikit-learn implementation you get 0.0 for that. See below.

But what is interesting about your experiment is that all 1's for Ghana produces a much higher score than one would expect from our training data.

from sklearn.metrics import f1_score
import pandas as pd
train = pd.read_csv('Train.csv')

## guess all zeros
f1_score(train['target'], np.zeros(len(train['target'])))
## 0.0

## guess all ones
f1_score(train['target'], np.ones(len(train['target']))) 
## 0.03598809932486554

## Or simpler, smaller sets
y_true = np.array([0, 0, 0, 0, 1])
y_pred = np.array([0, 0, 0, 0, 0])
print(f1_score(y_true, y_pred)) ## 0.0

## flip them all around
y_true = np.array([1, 1, 1, 1, 0]) 
y_pred = np.array([1, 1, 1, 1, 1]) print(f1_score(y_true, y_pred)) ## 0.889

## the same "quality" of prediction delivers a very different score, depending on 0 or 1.