Skip to content

Conversation

@adamsitnik
Copy link
Member

@adamsitnik adamsitnik commented Oct 27, 2025

Microsoft Reviewers: Open in CodeFlow

Copilot AI review requested due to automatic review settings October 27, 2025 16:43
@adamsitnik adamsitnik requested a review from a team as a code owner October 27, 2025 16:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces Markdown reading capabilities to the data ingestion library through two new readers: MarkdownReader for native Markdown files and MarkItDownReader that leverages the external MarkItDown tool to convert various document formats to Markdown before parsing.

Key changes:

  • Adds MarkdownReader for parsing .md files using the Markdig library
  • Adds MarkItDownReader that wraps the MarkItDown CLI tool to convert documents (PDF, DOCX, etc.) to Markdown
  • Introduces shared MarkdownParser to parse Markdig AST into IngestionDocument model
  • Implements comprehensive test suite with conformance tests and format-specific test cases

Reviewed Changes

Copilot reviewed 12 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
MarkdownReader.cs Implements reader for native Markdown files using Markdig parser
MarkdownParser.cs Core parsing logic converting Markdig AST to IngestionDocument model
MarkItDownReader.cs Wraps MarkItDown CLI tool to convert various document formats to Markdown
Microsoft.Extensions.DataIngestion.Markdown.csproj Project file for MarkdownReader with Markdig dependency
Microsoft.Extensions.DataIngestion.MarkItDown.csproj Project file for MarkItDownReader, shares MarkdownParser code
DocumentReaderConformanceTests.cs Base test class defining conformance tests for document readers
MarkdownReaderTests.cs Tests specific to MarkdownReader functionality
MarkItDownReaderTests.cs Tests specific to MarkItDownReader with CLI availability checks
ArrayUtils.cs Test utility for mapping 2D arrays used in table assertions
Microsoft.Extensions.DataIngestion.Tests.csproj Updated project file adding references and test file configuration
General.props Adds Markdig package reference
Versions.props Specifies Markdig version 0.42.0

@adamsitnik adamsitnik requested review from cincuranet and roji October 29, 2025 12:54
@ericstj ericstj requested a review from Copilot October 30, 2025 01:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 13 out of 20 changed files in this pull request and generated 3 comments.

# Conflicts:
#	test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Microsoft.Extensions.DataIngestion.Tests.csproj
- delete temporary file when .CopyToAsync fails
- handle all image types
@adamsitnik adamsitnik requested a review from cincuranet October 30, 2025 15:44
@adamsitnik adamsitnik requested a review from ericstj October 30, 2025 16:03
This was referenced Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants