Question 1

What file formats are supported?

Accepted Answer

PDF (single and multi-page), JPEG, PNG, WEBP, and TIFF. Files are uploaded directly to the API with a maximum size of 20 MB. Page limits vary by plan.

Question 2

Do I need to configure the extraction schema?

Accepted Answer

No. The invoice schema is predefined — every response returns the same consistent field set with values normalized to standard formats. Support for custom schemas is planned for a future release.

Question 3

What are per-field confidence scores?

Accepted Answer

Every extracted field includes a confidence score from 0 to 1. High scores indicate reliable extraction; lower scores flag fields that may need a second look — useful for routing uncertain results to human review before writing to your system of record.

Question 4

How does 2-pass extraction work?

Accepted Answer

On higher-tier plans, the API automatically runs a second extraction pass when critical fields are missing or have low confidence scores. The second pass targets only the failing fields rather than re-processing the entire document, improving accuracy without adding significant latency.

Question 5

What does MCP-ready mean?

Accepted Answer

Model Context Protocol (MCP) is the open standard for connecting AI agents to external tools. Document to JSON ships a native MCP server so any MCP-compatible AI agent — including Claude, Cursor, and others — can call invoice extraction directly without custom integration code.

Question 6

Is extracted data stored?

Accepted Answer

By default, uploaded files and extracted results are stored for 30 days so you can retrieve them later. Retention can be disabled on a per-job basis — useful for sensitive documents where you want no data kept after processing. Billing and audit records are always retained.

Question 7

Can I use webhooks for async processing?

Accepted Answer

Yes. Provide a webhook URL and the API will deliver the result when extraction completes, fails, or returns a partial result. Responses are signed so you can verify authenticity. You can also poll the job status endpoint directly if you prefer.

Documents to
JSON, automated.

The extraction API for PDFs, invoices, and documents — structured output, zero configuration.

Invoice extraction API —
quick start

How it works

Upload an invoice

Extraction runs automatically

Receive structured JSON with confidence scores

What gets extracted

Vendor & buyer details

Invoice identification

Amounts & currency

Line items

Payment accounts

Payment reference & notes

Frequently asked questions

Documents toJSON, automated.