feat: Add AzureDocumentIntelligenceConverter using the azure-ai-documentintelligence #10322

vblagoje · 2026-01-08T11:26:49Z

Why

Azure deprecated azure-ai-formrecognizer in favor of azure-ai-documentintelligence (v1.0.0, GA Dec 2024). New package supports markdown output format (GitHub Flavored Markdown) which is better suited for RAG/LLM applications - tables inline with context, preserved document structure (headings, lists), no manual assembly required.

fixes feat: Add a Azure OCR Converter that uses the azure-ai-documentintelligence library #8404

What

Added AzureDocumentIntelligenceConverter component:

Uses azure-ai-documentintelligence>=1.0.0 package (2024-11-30 API)
Markdown output mode (default): single document with inline tables, preserved structure
Text output mode (backward compat): separate CSV table documents or markdown tables
Simplified API: removed page_layout, threshold_y, preceding_context_len, following_context_len, merge_multiple_column_headers
Added output_format (markdown/text), table_format (csv/markdown)

Deprecated AzureOCRDocumentConverter (removal in Haystack 2.25)

How can it be used

  from haystack.components.converters import AzureDocumentIntelligenceConverter
  from haystack.utils import Secret

  # Markdown mode (recommended for RAG)
  converter = AzureDocumentIntelligenceConverter(
      endpoint=os.environ["AZURE_DI_ENDPOINT"],
      api_key=Secret.from_env_var("AZURE_AI_API_KEY"),
      output_format="markdown"
  )
  results = converter.run(sources=["invoice.pdf"])
  # Returns single document with markdown, tables inline

  # Text mode (backward compat)
  converter = AzureDocumentIntelligenceConverter(
      endpoint=os.environ["AZURE_DI_ENDPOINT"],
      api_key=Secret.from_env_var("AZURE_AI_API_KEY"),
      output_format="text",
      table_format="csv"
  )
  # Returns separate CSV table documents + text document

How did you test it

3 unit tests (init, to_dict, from_dict)
4 integration tests with real Azure API (markdown output, text+CSV tables, metadata handling, multiple files)

Notes for the reviewer

Migration path from old converter:

page_layout="natural" → output_format="markdown"
Remove context/layout params (Azure API handles this now)
Tables inline in markdown mode vs separate CSV docs

…elligence package

vercel · 2026-01-08T11:26:54Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Review	Updated (UTC)
haystack-docs	Ignored	Preview	Jan 8, 2026 1:15pm

anakin87 · 2026-01-09T08:57:21Z

@vblagoje @sjrl could this be a good opportunity to move this component to core-integrations?
Just an idea, curious to hear your opinions.

sjrl · 2026-01-09T09:11:44Z

Yeah I think this would be a good time to move this to core integrations

See this comment #8404 (comment) from the issue thread

vblagoje · 2026-01-12T08:23:14Z

Deal, moving to https://github.com/deepset-ai/haystack-core-integrations/

vblagoje · 2026-01-12T15:12:45Z

Superseded by deepset-ai/haystack-core-integrations#2717 Closing

vblagoje added 4 commits January 7, 2026 14:59

Add AzureDocumentIntelligenceConverter using the azure-ai-documentint…

baa1f2c

…elligence package

Merge branch 'main' into azure_doc_intelligence

11f8e57

Merge branch 'main' into azure_doc_intelligence

1c29cff

Use double backticks in repo notes

31d4ae9

github-actions bot added topic:tests topic:build/distribution type:documentation Improvements on the docs labels Jan 8, 2026

Add AZURE_AI_API_KEY and AZURE_DI_ENDPOINT env vars from Github secrets

857dfd8

github-actions bot added the topic:CI label Jan 8, 2026

vblagoje added 2 commits January 8, 2026 13:56

Linting

b41ab47

More linting

53362e7

vblagoje closed this Jan 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add AzureDocumentIntelligenceConverter using the azure-ai-documentintelligence #10322

feat: Add AzureDocumentIntelligenceConverter using the azure-ai-documentintelligence #10322

Uh oh!

vblagoje commented Jan 8, 2026

Uh oh!

vercel bot commented Jan 8, 2026 •

edited

Loading

Uh oh!

anakin87 commented Jan 9, 2026

Uh oh!

sjrl commented Jan 9, 2026 •

edited

Loading

Uh oh!

vblagoje commented Jan 12, 2026

Uh oh!

vblagoje commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Add AzureDocumentIntelligenceConverter using the azure-ai-documentintelligence #10322

feat: Add AzureDocumentIntelligenceConverter using the azure-ai-documentintelligence #10322

Uh oh!

Conversation

vblagoje commented Jan 8, 2026

Why

What

How can it be used

How did you test it

Notes for the reviewer

Uh oh!

vercel bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anakin87 commented Jan 9, 2026

Uh oh!

sjrl commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vblagoje commented Jan 12, 2026

Uh oh!

vblagoje commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vercel bot commented Jan 8, 2026 •

edited

Loading

sjrl commented Jan 9, 2026 •

edited

Loading