Bank Statement Parsing Benchmark

The first open benchmark for measuring bank statement extraction accuracy. 15 synthetic statements, 34 parsing challenges, 12 countries, across 3 difficulty tiers.

15
Statements
34
Challenges
12
Countries
3
Difficulty Tiers

Quick Start

Get started in three steps.

1. Download a statement
curl -O https://bankstatemently.com/benchmark/statements/bsb-001-statement.pdf
2. Parse it with your tool
your-parser bsb-001-statement.pdf > result.json
3. Submit for scoring
HASH=$(shasum -a 256 bsb-001-statement.pdf | cut -d' ' -f1)
curl -X POST https://api.bankstatemently.com/v1/benchmark/evaluate \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d "{\"contentHash\": \"$HASH\", \"transactions\": $(cat result.json)}"

Why Bank Statement Parsing Is Hard

Real-world bank statements contain dozens of formatting quirks that break naive parsers.

Dates

Missing Year in Dates

Statements omit the year from transaction dates (e.g., "03/15" instead of "03/15/2025"), which makes it impossible to sort or reconcile without guessing.

In 5 statements
Dates

Blank Dates for Same-Day Transactions

Statements only print the date on the first transaction of each day โ€” the rest have blank date cells.

In 1 statement
Amounts

Plus/Minus Sign After the Amount

Statements place the plus or minus sign after the amount (e.g., "123.45-" instead of "-123.45").

In 0 statements
Amounts

Currency Symbols Breaking Numeric Parsing

Statements embed currency symbols (e.g., "$1,234.56") directly in amount cells, which breaks numeric parsing in Excel and most CSV importers.

In 2 statements
Layout

No Visible Table Lines or Borders

Statements sometimes lack clear row or column lines.

In 1 statement
Layout

Separate Credit & Debit Columns

Statements use separate Credit and Debit columns in some formats, while others use a single Amount column with signs.

In 10 statements
Structure

Descriptions Wrapped Across Lines

Transaction descriptions often span multiple lines in the PDF, making it hard to tell where one transaction ends and the next begins.

In 1 statement
Structure

Phantom Balance Rows on Every Page

Statements repeat "Balance brought forward" and "Balance carried forward" rows on every page.

In 1 statement

The Dataset

Basic5 statements

Straits Capital
bsb-001Straits Capital
๐Ÿ‡ธ๐Ÿ‡ฌ SGBank Statement3 pages12 transactions3 challenges
Download PDF โ†’
Liberty National Bank
bsb-002Liberty National Bank
๐Ÿ‡บ๐Ÿ‡ธ USCredit Card4 pages15 transactions4 challenges
Coming soon
Continental Trust
bsb-003Continental Trust
๐Ÿ‡ณ๐Ÿ‡ฑ NLBank Statement2 pages22 transactions5 challenges
Coming soon
Silk Road Banking
bsb-004Silk Road Banking
๐Ÿ‡ญ๐Ÿ‡ฐ HKBank Statement3 pages25 transactions4 challenges
Coming soon
Harbour Bank
bsb-005Harbour Bank
๐Ÿ‡จ๐Ÿ‡ฆ CABank Statement1 page25 transactions3 challenges
Coming soon

Intermediate5 statements

Liberty National Bank
bsb-006Liberty National Bank
๐Ÿ‡ฒ๐Ÿ‡ฝ MXBank Statement1 page30 transactions4 challenges
Coming soon
Continental Trust
bsb-007Continental Trust
๐Ÿ‡จ๐Ÿ‡ฆ CACredit Card1 page25 transactions4 challenges
Coming soon
Southern Cross Financial
bsb-008Southern Cross Financial
๐Ÿ‡ฆ๐Ÿ‡บ AUBank Statement2 pages30 transactions6 challenges
Coming soon
Harbour Bank
bsb-009Harbour Bank
๐Ÿ‡ฌ๐Ÿ‡ง GBBank Statement1 page35 transactions6 challenges
Coming soon
Straits Capital
bsb-010Straits Capital
๐Ÿ‡ฎ๐Ÿ‡ณ INBank Statement25 pages500 transactions6 challenges
Coming soon

Advanced5 statements

Southern Cross Financial
bsb-011Southern Cross Financial
๐Ÿ‡ญ๐Ÿ‡ฐ HKBank Statement2 pages35 transactions6 challenges
Coming soon
Straits Capital
bsb-012Straits Capital
๐Ÿ‡ธ๐Ÿ‡ฌ SGCredit Card1 page33 transactions6 challenges
Coming soon
Silk Road Banking
bsb-013Silk Road Banking
๐Ÿ‡ฐ๐Ÿ‡ฟ KZBank Statement2 pages35 transactions5 challenges
Coming soon
Silk Road Banking
bsb-014Silk Road Banking
๐Ÿ‡น๐Ÿ‡ญ THBank Statement1 page28 transactions6 challenges
Coming soon
Southern Cross Financial
bsb-015Southern Cross Financial
๐Ÿ‡ฒ๐Ÿ‡พ MYCredit Card1 page20 transactions6 challenges
Coming soon

How Scoring Works

Your Parser Output
Bankstatemently Evaluator
Accuracy Score

Your score is measured across two dimensions:

Extraction Accuracy

Field-by-field comparison of dates, amounts, descriptions, and balances against ground truth.

Statement Integrity

Balance reconciliation, total validation, and row-level alignment across the full statement.

Frequently Asked Questions

A bank statement parsing benchmark is a standardized dataset of bank statements with known ground-truth data. It lets you measure how accurately a parser extracts transactions, dates, amounts, and balances from PDF bank statements. Our benchmark includes 15 synthetic statements covering 34 distinct parsing challenges across 12 countries.

For Developers

Download the dataset, integrate the evaluation API, and benchmark your parser.

For Teams Evaluating Parsers

See how your current solution performs, or try Bankstatemently on your own statements.