💳 This Week on Zindi: something werid (probing)

African Credit Scoring Challenge

Helping Africa

$5 000 USD

Completed (over 1 year ago)

Skills you will learn

2006 joined

1020 active

Info Data Chat Leaderboard

Start

Nov 29, 24

Jan 12, 25

Reveal

Jan 13, 25

DanielTobi0

Miva Open University

something werid (probing)

Help · 3 Jan 2025, 21:44 · 3

to the zindi gods here, what do you think of this?

for the 70% of public evaluation, Ghana dataset has a correct prediction of 1 which gives a score of 0.217, while setting to 0 gives 0.0 score.

test = pd.read_csv('Test.csv')

ss = pd.read_csv('SampleSubmission.csv')

test['target'] = 999

# setting ghana to 1 gives a score of 0.217391304

# setting ghana to 0 gives a score of 0.0

test.loc[test['country_id'] == 'Ghana', 'target'] = 1

ss['target'] = test['target']

ss.to_csv('sub.csv', index=False)

Discussion 3 answers

da_

It's binary f1 score. Getting no prediction right i.e 0 recall, gives 0 f1 score. Right?

3 Jan 2025, 21:53

Upvotes 2

DanielTobi0

Miva Open University

Can you elaborate 🤔

replied to da_3 Jan 2025, 22:17

Upvotes 0

mlandry

H2O.ai

The way many binary classifications are handled, true positives are required to generate any recall or precision. The negative predictions actually don't count for anything.

You can see it with the training set, if you like. Most of the data set is zero, so guessing all zeros would seem to be accurate for most. But if you use the scikit-learn implementation you get 0.0 for that. See below.

But what is interesting about your experiment is that all 1's for Ghana produces a much higher score than one would expect from our training data.

from sklearn.metrics import f1_score
import pandas as pd
train = pd.read_csv('Train.csv')

## guess all zeros
f1_score(train['target'], np.zeros(len(train['target'])))
## 0.0

## guess all ones
f1_score(train['target'], np.ones(len(train['target']))) 
## 0.03598809932486554

## Or simpler, smaller sets
y_true = np.array([0, 0, 0, 0, 1])
y_pred = np.array([0, 0, 0, 0, 0])
print(f1_score(y_true, y_pred)) ## 0.0

## flip them all around
y_true = np.array([1, 1, 1, 1, 0]) 
y_pred = np.array([1, 1, 1, 1, 1]) print(f1_score(y_true, y_pred)) ## 0.889

## the same "quality" of prediction delivers a very different score, depending on 0 or 1.

replied to DanielTobi03 Jan 2025, 22:31

Upvotes 1

Join the largest network for
data scientists and AI builders

About FAQs

Status