How do you measure bank statement extraction accuracy?

We evaluate two dimensions: extraction accuracy (field-by-field comparison of dates, amounts, descriptions, and balances against ground truth) and statement integrity (balance reconciliation, total validation, and row-level alignment across the full statement).

Why use synthetic statements instead of real ones?

Synthetic statements eliminate privacy and legal concerns entirely — no real customer data is involved. They also let us control exactly which parsing challenges appear in each statement, making results reproducible. Every statement is generated from realistic templates modeled on actual bank formats.

Can I submit my parser for evaluation?

Yes. Parse any of the public statements, then submit your results to our evaluation API. You will receive a detailed accuracy breakdown showing where your parser succeeded and where it struggled. See the Quick Start section above or visit the API documentation for integration details.

Is the benchmark dataset free to use?

Yes. All 15 statements are freely available for download. Ground truth data stays server-side — you submit your parsed results to the evaluation API and receive a score. This prevents overfitting while keeping the full dataset accessible.

Bank Statement Parsing Benchmark

Name: Bank Statement Parsing Benchmark
Creator: Bankstatemently
Published: 2026-03-15
License: https://creativecommons.org/licenses/by/4.0/

The first open benchmark for measuring bank statement extraction accuracy. 15 synthetic statements, 36 parsing challenges, 12 countries, across 3 difficulty tiers.

Published: March 15, 2026

Last updated: March 17, 2026

·See how commercial tools score

Statements

Challenges

Countries

Difficulty Tiers

Quick Start

Get started in three steps.

1. Download a statement

curl -O https://bankstatemently.com/benchmark/statements/bsb-001-statement.pdf

2. Parse it with your tool

your-parser bsb-001-statement.pdf > result.json

3. Submit for scoring

HASH=$(shasum -a 256 bsb-001-statement.pdf | cut -d' ' -f1)
curl -X POST https://api.bankstatemently.com/v1/benchmark/evaluate \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d "{\"contentHash\": \"$HASH\", \"transactions\": $(cat result.json)}"

Why Bank Statement Parsing Is Hard

Real-world bank statements contain dozens of formatting quirks that break naive parsers.

Dates & Time

Missing Year in Dates

Statements omit the year from transaction dates (e.g., "03/15" instead of "03/15/2025"), which makes it impossible to sort or reconcile without guessing.

In 4 statements

Dates & Time

Blank Dates for Same-Day Transactions

Statements only print the date on the first transaction of each day — the rest have blank date cells.

In 1 statement

Amounts & Currency

Plus/Minus Sign After the Amount

Statements place the plus or minus sign after the amount (e.g., "123.45-" instead of "-123.45").

In 1 statement

Amounts & Currency

Currency Symbols Breaking Numeric Parsing

Statements embed currency symbols (e.g., "$1,234.56") directly in amount cells, which breaks numeric parsing in Excel and most CSV importers.

In 2 statements

Layout & Structure

No Visible Table Lines or Borders

Statements sometimes lack clear row or column lines.

In 1 statement

Layout & Structure

Separate Credit & Debit Columns

Statements use separate Credit and Debit columns in some formats, while others use a single Amount column with signs.

In 10 statements

Document Quality

Page Headers & Footers Mixed with Data

Page headers and footers contain bank logos, page numbers, and disclaimers that can interfere with transaction extraction.

In 1 statement

Document Quality

No Selectable Text (Scanned PDF)

Statements are often image-based PDFs with no selectable text, so copy-paste and standard PDF extractors return nothing useful.

In 3 statements

Explore all 36 challenges

The Dataset

Basic5 statements

bsb-001Straits Capital

🇸🇬SGBank Statement3 pages

12 transactions3 challenges

Download PDF →

Thumbnail preview of benchmark statement bsb-001

bsb-002Liberty National Bank

🇺🇸USCredit Card4 pages

15 transactions5 challenges

Download PDF →

Thumbnail preview of benchmark statement bsb-002

bsb-003Continental Trust

🇳🇱NLBank Statement3 pages

22 transactions5 challenges

Download PDF →

Thumbnail preview of benchmark statement bsb-003

bsb-004Silk Road Banking

🇭🇰HKBank Statement4 pages

25 transactions5 challenges

Download PDF →

Thumbnail preview of benchmark statement bsb-004

bsb-005Harbour Bank

🇨🇦CABank Statement2 pages

25 transactions5 challenges

Download PDF →

Thumbnail preview of benchmark statement bsb-005

Intermediate5 statements

bsb-006Liberty National Bank

🇲🇽MXBank Statement1 page

30 transactions4 challenges

Coming soon

bsb-007Continental Trust

🇨🇦CACredit Card1 page

25 transactions3 challenges

Coming soon

bsb-008Southern Cross Financial

🇦🇺AUBank Statement1 page

30 transactions5 challenges

Coming soon

bsb-009Harbour Bank

🇬🇧GBBank Statement1 page

35 transactions6 challenges

Coming soon

bsb-010Straits Capital

🇮🇳INBank Statement25 pages

500 transactions6 challenges

Coming soon

Advanced5 statements

bsb-011Southern Cross Financial

🇭🇰HKBank Statement2 pages

35 transactions6 challenges

Coming soon

bsb-012Straits Capital

🇸🇬SGCredit Card1 page

33 transactions6 challenges

Coming soon

bsb-013Silk Road Banking

🇰🇿KZBank Statement2 pages

35 transactions5 challenges

Coming soon

bsb-014Silk Road Banking

🇹🇭THBank Statement1 page

28 transactions5 challenges

Coming soon

bsb-015Southern Cross Financial

🇲🇾MYCredit Card1 page

20 transactions7 challenges

Coming soon

Browse all 36 parsing challenges →

How Scoring Works

Your Parser Output

Bankstatemently Evaluator

Accuracy Score

Your score is measured across two dimensions — the same framework we use to independently test commercial converters.

Extraction Accuracy

Field-by-field comparison of dates, amounts, descriptions, and balances against ground truth.

Statement Integrity

Balance reconciliation, total validation, and row-level alignment across the full statement.

Frequently Asked Questions

A bank statement parsing benchmark is a standardized dataset of bank statements with known ground-truth data. It lets you measure how accurately a parser extracts transactions, dates, amounts, and balances from PDF bank statements. Our benchmark includes 15 synthetic statements covering 36 distinct parsing challenges across 12 countries.

For Developers

Download the dataset, integrate the evaluation API, and benchmark your parser.

For Teams Evaluating Parsers

See how your current solution performs, or try Bankstatemently on your own statements.

See how commercial tools score on our benchmark

Bank Statement Parsing Benchmark

Quick Start

Why Bank Statement Parsing Is Hard

Missing Year in Dates

Blank Dates for Same-Day Transactions

Plus/Minus Sign After the Amount

Currency Symbols Breaking Numeric Parsing

No Visible Table Lines or Borders

Separate Credit & Debit Columns

Page Headers & Footers Mixed with Data

No Selectable Text (Scanned PDF)

The Dataset

Basic5 statements

Intermediate5 statements

Advanced5 statements

How Scoring Works

Extraction Accuracy

Statement Integrity

Frequently Asked Questions

What is a bank statement parsing benchmark?

How do you measure bank statement extraction accuracy?

Why use synthetic statements instead of real ones?

Can I submit my parser for evaluation?

Is the benchmark dataset free to use?

For Developers

For Teams Evaluating Parsers