Bank Statement Parsing Benchmark

The first open benchmark for measuring bank statement extraction accuracy. 15 synthetic statements, 36 parsing challenges, 12 countries, across 3 difficulty tiers.

Published: March 15, 2026
·
Last updated: March 17, 2026
·See how commercial tools score
15
Statements
36
Challenges
12
Countries
3
Difficulty Tiers
Bankstatemently Open Benchmark — Featured on Product Hunt

Quick Start

Get started in three steps.

1. Download a statement
curl -O https://bankstatemently.com/benchmark/statements/bsb-001-statement.pdf
2. Parse it with your tool
your-parser bsb-001-statement.pdf > result.json
3. Submit for scoring
HASH=$(shasum -a 256 bsb-001-statement.pdf | cut -d' ' -f1)
curl -X POST https://api.bankstatemently.com/v1/benchmark/evaluate \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d "{\"contentHash\": \"$HASH\", \"transactions\": $(cat result.json)}"

Why Bank Statement Parsing Is Hard

Real-world bank statements contain dozens of formatting quirks that break naive parsers.

Dates & Time

Missing Year in Dates

Statements omit the year from transaction dates (e.g., "03/15" instead of "03/15/2025"), which makes it impossible to sort or reconcile without guessing.

In 4 statements
Dates & Time

Blank Dates for Same-Day Transactions

Statements only print the date on the first transaction of each day — the rest have blank date cells.

In 1 statement
Amounts & Currency

Plus/Minus Sign After the Amount

Statements place the plus or minus sign after the amount (e.g., "123.45-" instead of "-123.45").

In 1 statement
Amounts & Currency

Currency Symbols Breaking Numeric Parsing

Statements embed currency symbols (e.g., "$1,234.56") directly in amount cells, which breaks numeric parsing in Excel and most CSV importers.

In 2 statements
Layout & Structure

No Visible Table Lines or Borders

Statements sometimes lack clear row or column lines.

In 1 statement
Layout & Structure

Separate Credit & Debit Columns

Statements use separate Credit and Debit columns in some formats, while others use a single Amount column with signs.

In 10 statements
Document Quality

Page Headers & Footers Mixed with Data

Page headers and footers contain bank logos, page numbers, and disclaimers that can interfere with transaction extraction.

In 1 statement
Document Quality

No Selectable Text (Scanned PDF)

Statements are often image-based PDFs with no selectable text, so copy-paste and standard PDF extractors return nothing useful.

In 3 statements

The Dataset

Intermediate5 statements

Liberty National Bank
bsb-006Liberty National Bank
🇲🇽 MXBank Statement1 page
30 transactions4 challenges
Coming soon
Continental Trust
bsb-007Continental Trust
🇨🇦 CACredit Card1 page
25 transactions3 challenges
Coming soon
Southern Cross Financial
bsb-008Southern Cross Financial
🇦🇺 AUBank Statement1 page
30 transactions5 challenges
Coming soon
Harbour Bank
bsb-009Harbour Bank
🇬🇧 GBBank Statement1 page
35 transactions6 challenges
Coming soon
Straits Capital
bsb-010Straits Capital
🇮🇳 INBank Statement25 pages
500 transactions6 challenges
Coming soon

Advanced5 statements

Southern Cross Financial
bsb-011Southern Cross Financial
🇭🇰 HKBank Statement2 pages
35 transactions6 challenges
Coming soon
Straits Capital
bsb-012Straits Capital
🇸🇬 SGCredit Card1 page
33 transactions6 challenges
Coming soon
Silk Road Banking
bsb-013Silk Road Banking
🇰🇿 KZBank Statement2 pages
35 transactions5 challenges
Coming soon
Silk Road Banking
bsb-014Silk Road Banking
🇹🇭 THBank Statement1 page
28 transactions5 challenges
Coming soon
Southern Cross Financial
bsb-015Southern Cross Financial
🇲🇾 MYCredit Card1 page
20 transactions7 challenges
Coming soon

How Scoring Works

Your Parser Output
Bankstatemently Evaluator
Accuracy Score

Your score is measured across two dimensions — the same framework we use to independently test commercial converters.

Extraction Accuracy

Field-by-field comparison of dates, amounts, descriptions, and balances against ground truth.

Statement Integrity

Balance reconciliation, total validation, and row-level alignment across the full statement.

Frequently Asked Questions

A bank statement parsing benchmark is a standardized dataset of bank statements with known ground-truth data. It lets you measure how accurately a parser extracts transactions, dates, amounts, and balances from PDF bank statements. Our benchmark includes 15 synthetic statements covering 36 distinct parsing challenges across 12 countries.

For Developers

Download the dataset, integrate the evaluation API, and benchmark your parser.

For Teams Evaluating Parsers

See how your current solution performs, or try Bankstatemently on your own statements.