case-studyMay 8, 2026

How an EU Importer Cut Invoice Processing from 9 Hours to 75 Minutes a Week

A small import/distribution team replaced manual invoice keying with an agentic AP flow: vision extraction, rule validation, API posting, and human review only on outliers. The result was faster closes, fewer errors, and payback in weeks—not months.

For many small import/distribution companies, accounts payable is still a spreadsheet-and-copy-paste operation. The business we’re covering here (anonymised, EU-based, ~20 staff) was no different: supplier invoices arrived in multiple formats, a bookkeeper keyed line items manually into accounting software, and month-end always came with a rush.

The process worked, but it was fragile and expensive in the way manual back-office work usually is: steady effort every week, plus periodic stress when volume spiked.

The manual baseline

Before automation, the flow looked like this:

Invoices arrived by email as PDFs (some digital, some scanned).
The bookkeeper opened each file, copied header fields and totals into the accounting system.
VAT treatment and account mapping were checked manually.
Any missing fields triggered a back-and-forth over email.
A second pass happened before payment runs to catch obvious mistakes.

On average, they processed about 90–110 supplier invoices per week. Manual entry and checks consumed roughly 9 hours/week of the bookkeeper’s time, with additional load at month-end.

Observed error rate (wrong field, wrong account code, typo in totals, duplicate entry) was 2.8% on first pass. Most errors were recoverable, but they still created rework and delayed approvals.

What we built

We implemented a narrow AP agent focused on one job: convert inbound invoices into clean draft postings, then route edge cases to humans.

The pipeline had four stages:

Document intake and extraction: invoices were pulled from a dedicated mailbox and sent to a Codex vision extraction step to read supplier name, invoice number, dates, VAT lines, currency, subtotals, totals, and line-item details.
Validation tool layer: extracted data was checked against deterministic rules (required fields, sum checks, VAT consistency, duplicate invoice number per supplier, known supplier mappings, valid cost centers).
Accounting API posting: invoices that passed validation were posted as drafts through the accounting platform’s API, with the source PDF linked for auditability.
Outlier queue for human review: only failed validations went to a reviewer queue with a short reason code (for example: “VAT mismatch > tolerance” or “unknown supplier mapping”).

This is the key design choice: the agent didn’t try to “be smart” about everything. It automated the boring majority and escalated uncertainty quickly.

Results after rollout

After a short stabilization period, the numbers were clear:

Weekly AP handling time dropped from ~9 hours to ~1 hour 15 minutes (about 86% reduction).
First-pass error rate dropped from 2.8% to 0.6%.
72% of invoices posted straight through without manual edits.
Remaining 28% were routed to review, usually resolved in under 3 minutes each.

The practical effect wasn’t just time saved. Month-end close became less chaotic because entries were spread and validated continuously rather than batched in stressful blocks.

On economics, implementation and integration costs were recovered in about 8 weeks from labor savings and reduced rework alone. No heroic assumptions—just fewer manual touches and fewer corrections.

One thing that failed on the first try

The first version tried to auto-map expense accounts from line-item descriptions using loose text similarity. On paper it looked efficient. In production, it created subtle mistakes: similar wording from different suppliers mapped to the wrong account in edge cases.

Nothing catastrophic, but enough to erode trust.

We replaced it with a stricter approach:

supplier-specific mapping tables,
confidence thresholds,
and explicit “unknown mapping” escalation.

Accuracy improved immediately. Throughput stayed high because most recurring suppliers already had stable patterns.

The lesson is simple: in AP, deterministic controls beat clever guesses unless you have strong guardrails and high-quality historical data.

Why this worked for a small team

Small businesses often assume AP automation requires enterprise-scale systems. It doesn’t. What matters is scoped design:

one narrow workflow,
clear acceptance rules,
clean handoff to humans when confidence is low.

The team kept full control. Every posted draft had source traceability. Every exception had a reason code. And no one had to rebuild the accounting stack.

From an operations perspective, this is what “agentic” should mean in finance: not replacing accounting judgment, but removing repetitive mechanics so judgment is used where it matters.

Implementation notes for operators

If you’re considering a similar rollout, three decisions matter early:

Define your validation policy before extraction tuning. Teams often reverse this and end up with noisy outputs they can’t govern.
Track exception reasons from day one. Those labels become your roadmap for the next automation gains.
Start with draft posting, not final posting. It accelerates adoption because finance keeps control while confidence builds.

In this case, once the team saw stable draft quality and low exception noise, expanding automation became a process decision, not a technical leap.

Want this kind of agent quietly running parts of your operation? Chat with us — we'll scope a pilot in the same conversation.