Primary competition visual

June Study Jam Series: Bank Transaction Volume Forecasting Challenge

Helping South Africa
500 Points
19 days left
Feature Engineering
Time-series
Forecast
107 joined
14 active
Starti
Jun 10, 26
Closei
Jun 30, 26
Reveali
Jun 30, 26
Can you turn behavioural signals into accurate transaction forecasts?

Every day, millions of Africans interact with their bank - swiping cards, making transfers, paying bills, receiving salaries. Behind each of these touch-points lies a rich behavioural signal. Understanding and anticipating customer transaction volumes is a foundational capability: it drives capacity planning, fraud detection, product development, and personalised service delivery. The question is deceptively simple - how many transactions will a given customer make in the next three months? - but the answer demands genuine data science skill.

In this challenge, you are provided with anonymised behavioural data for nearly 12,000 customers spanning up to 34 months of transaction history, monthly financial snapshots, and cleaned demographic profiles. Your task is to predict next_3m_txn_count - the total number of bank transactions each customer will make over a future three-month window (November 2015 through January 2016). This is a regression problem scored using Root Mean Squared Logarithmic Error (RMSLE), a metric that penalises large relative errors and handles the right-skewed distribution of transaction counts gracefully.

What makes this challenge compelling is its real-world texture. The data is not clean-room synthetic - it has the quirks of production banking data: high-cardinality free-text descriptions, partial nulls in income fields, seasonality effects from the November to January holiday period, and customers whose behaviour varies wildly from month to month. Success will reward thoughtful feature engineering, careful handling of temporal patterns, and models that generalise rather than memorise.

This challenge is more than a competition.

Prizes

This challenge is a learning opportunity: Award 500 Zindi Points

Evaluation

The error metric for this challenge is Root Mean Squared Logarithmic Error (RMSLE), implemented as RMSE on log-transformed values. See the submission instructions below for how to format your predictions.

Your submission file must contain exactly 2 columns: UniqueID and next_3m_txn_count.

Important - log-transformed submissions required: The platform scores using RMSE on log-transformed values, which is equivalent to RMSLE. You must submit the natural log of your predictions plus one. In Python: np.log1p(y_pred). Do not submit raw predicted counts - your score will be incorrect if you do.

The order of rows does not matter, but you must include predictions for all 3,584 customers in Test.csv. Your submission should look like this:

UniqueID                                 next_3m_txn_count
6b62ce75-9823-4de6-ba7b-8b2b199df239     3.456
e193e600-a706-4bc6-8597-d5d6fb171ab5     4.321
8fd44803-12ed-46ab-a146-8496d95d1b13     2.789

Rules
  • Languages and tools: You may only use open-source languages and tools in building models for this challenge.
  • Who can compete: Open to all participants. To be eligible for prizes and the in-person finale, you must be a South African citizen or permanent resident, or hold a valid work permit for South Africa.
  • Submission Limits: 10 submissions per day, 300 submissions overall.
  • Team size: 0 (only individuals can compete)
  • Public-Private Split: Zindi maintains a public leaderboard and a private leaderboard for each challenge. The Public Leaderboard includes approximately 30% of the test dataset. The private leaderboard will be revealed at the close of the challenge and contains the remaining 70% of the test set.
  • Data Sharing: CC-BY SA 4.0 license
  • Code sharing: Multiple accounts, or sharing of code and information across accounts not in teams, is not allowed and will lead to disqualification.