Every day, millions of Africans interact with their bank - swiping cards, making transfers, paying bills, receiving salaries. Behind each of these touch-points lies a rich behavioural signal. Understanding and anticipating customer transaction volumes is a foundational capability: it drives capacity planning, fraud detection, product development, and personalised service delivery. The question is deceptively simple - how many transactions will a given customer make in the next three months? - but the answer demands genuine data science skill.
In this challenge, you are provided with anonymised behavioural data for nearly 12,000 customers spanning up to 34 months of transaction history, monthly financial snapshots, and cleaned demographic profiles. Your task is to predict next_3m_txn_count - the total number of bank transactions each customer will make over a future three-month window (November 2015 through January 2016). This is a regression problem scored using Root Mean Squared Logarithmic Error (RMSLE), a metric that penalises large relative errors and handles the right-skewed distribution of transaction counts gracefully.
What makes this challenge compelling is its real-world texture. The data is not clean-room synthetic - it has the quirks of production banking data: high-cardinality free-text descriptions, partial nulls in income fields, seasonality effects from the November to January holiday period, and customers whose behaviour varies wildly from month to month. Success will reward thoughtful feature engineering, careful handling of temporal patterns, and models that generalise rather than memorise.
This challenge is more than a competition.
This challenge is a learning opportunity: Award 500 Zindi Points
The error metric for this challenge is Root Mean Squared Logarithmic Error (RMSLE), implemented as RMSE on log-transformed values. See the submission instructions below for how to format your predictions.
Your submission file must contain exactly 2 columns: UniqueID and next_3m_txn_count.
Important - log-transformed submissions required: The platform scores using RMSE on log-transformed values, which is equivalent to RMSLE. You must submit the natural log of your predictions plus one. In Python: np.log1p(y_pred). Do not submit raw predicted counts - your score will be incorrect if you do.
The order of rows does not matter, but you must include predictions for all 3,584 customers in Test.csv. Your submission should look like this:
UniqueID next_3m_txn_count
6b62ce75-9823-4de6-ba7b-8b2b199df239 3.456
e193e600-a706-4bc6-8597-d5d6fb171ab5 4.321
8fd44803-12ed-46ab-a146-8496d95d1b13 2.789
Join the largest network for
data scientists and AI builders