diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/Microsoft.Extensions.DataIngestion.Abstractions.csproj b/src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/Microsoft.Extensions.DataIngestion.Abstractions.csproj index ba7ec363bc8..9740d6cf4b7 100644 --- a/src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/Microsoft.Extensions.DataIngestion.Abstractions.csproj +++ b/src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/Microsoft.Extensions.DataIngestion.Abstractions.csproj @@ -3,13 +3,14 @@ $(TargetFrameworks);netstandard2.0 Microsoft.Extensions.DataIngestion - - - false + Abstractions representing Data Ingestion components for RAG. + RAG + preview + false + 75 + 75 $(NoWarn);S1694 - preview - false diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/README.md b/src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/README.md new file mode 100644 index 00000000000..0285f27fb3d --- /dev/null +++ b/src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/README.md @@ -0,0 +1,39 @@ +# Microsoft.Extensions.DataIngestion.Abstractions + +.NET developers need to efficiently process, chunk, and retrieve information from diverse document formats while preserving semantic meaning and structural context. The `Microsoft.Extensions.DataIngestion` libraries provide a unified approach for representing document ingestion components. + +## The packages + +The [Microsoft.Extensions.DataIngestion.Abstractions](https://www.nuget.org/packages/Microsoft.Extensions.DataIngestion.Abstractions) package provides the core exchange types, including [`IngestionDocument`](https://learn.microsoft.com/dotnet/api/microsoft.extensions.dataingestion.ingestiondocument), [`IngestionChunker`](https://learn.microsoft.com/dotnet/api/microsoft.extensions.dataingestion.ingestionchunker-1), [`IngestionChunkProcessor`](https://learn.microsoft.com/dotnet/api/microsoft.extensions.dataingestion.ingestionchunkprocessor-1), and [`IngestionChunkWriter`](https://learn.microsoft.com/dotnet/api/microsoft.extensions.dataingestion.ingestionchunkwriter-1). Any .NET library that provides document processing capabilities can implement these abstractions to enable seamless integration with consuming code. + +The [Microsoft.Extensions.DataIngestion](https://www.nuget.org/packages/Microsoft.Extensions.DataIngestion) package has an implicit dependency on the `Microsoft.Extensions.DataIngestion.Abstractions` package. This package enables you to easily integrate components such as enrichment processors, vector storage writers, and telemetry into your applications using familiar dependency injection and pipeline patterns. For example, it provides processors for sentiment analysis, keyword extraction, and summarization that can be chained together in ingestion pipelines. + +## Which package to reference + +Libraries that provide implementations of the abstractions typically reference only `Microsoft.Extensions.DataIngestion.Abstractions`. + +To also have access to higher-level utilities for working with document ingestion components, reference the `Microsoft.Extensions.DataIngestion` package instead (which itself references `Microsoft.Extensions.DataIngestion.Abstractions`). Most consuming applications and services should reference the `Microsoft.Extensions.DataIngestion` package along with one or more libraries that provide concrete implementations of the abstractions, such as `Microsoft.Extensions.DataIngestion.MarkItDown` or `Microsoft.Extensions.DataIngestion.Markdig`. + +## Install the package + +From the command-line: + +```console +dotnet add package Microsoft.Extensions.DataIngestion.Abstractions --prerelease +``` + +Or directly in the C# project file: + +```xml + + + +``` + +## Documentation + +Refer to the [Microsoft.Extensions.DataIngestion libraries documentation](https://learn.microsoft.com/dotnet/dataingestion/microsoft-extensions-dataingestion) for more information and API usage examples. + +## Feedback & Contributing + +We welcome feedback and contributions in [our GitHub repo](https://github.com/dotnet/extensions). diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/Microsoft.Extensions.DataIngestion.MarkItDown.csproj b/src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/Microsoft.Extensions.DataIngestion.MarkItDown.csproj index 74fab572963..7d97c5631fb 100644 --- a/src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/Microsoft.Extensions.DataIngestion.MarkItDown.csproj +++ b/src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/Microsoft.Extensions.DataIngestion.MarkItDown.csproj @@ -3,11 +3,12 @@ $(TargetFrameworks);netstandard2.0 Microsoft.Extensions.DataIngestion - - - false + Implementation of IngestionDocumentReader abstraction for MarkItDown. + RAG preview false + 75 + 75 diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/README.md b/src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/README.md new file mode 100644 index 00000000000..79e340fad9a --- /dev/null +++ b/src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/README.md @@ -0,0 +1,36 @@ +# Microsoft.Extensions.DataIngestion.MarkItDown + +Provides an implementation of the `IngestionDocumentReader` class for the [MarkItDown](https://github.com/microsoft/markitdown/) utility. + +## Install the package + +From the command-line: + +```console +dotnet add package Microsoft.Extensions.DataIngestion.MarkItDown --prerelease +``` + +Or directly in the C# project file: + +```xml + + + +``` + +## Usage Examples + +### Creating a MarkItDownReader for Data Ingestion + +```csharp +using Microsoft.Extensions.DataIngestion; + +IngestionDocumentReader reader = + new MarkItDownReader(new FileInfo(@"pathToMarkItDown.exe"), extractImages: true); + +using IngestionPipeline pipeline = new(reader, CreateChunker(), CreateWriter()); +``` + +## Feedback & Contributing + +We welcome feedback and contributions in [our GitHub repo](https://github.com/dotnet/extensions). diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/Microsoft.Extensions.DataIngestion.Markdig.csproj b/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/Microsoft.Extensions.DataIngestion.Markdig.csproj index eb47fca74b7..131c8a15bce 100644 --- a/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/Microsoft.Extensions.DataIngestion.Markdig.csproj +++ b/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/Microsoft.Extensions.DataIngestion.Markdig.csproj @@ -3,11 +3,12 @@ $(TargetFrameworks);netstandard2.0 Microsoft.Extensions.DataIngestion - - - false + Implementation of IngestionDocumentReader abstraction for Markdown. + RAG preview false + 75 + 75 diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/README.md b/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/README.md new file mode 100644 index 00000000000..c6a2328699c --- /dev/null +++ b/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/README.md @@ -0,0 +1,35 @@ +# Microsoft.Extensions.DataIngestion.Markdig + +Provides an implementation of the `IngestionDocumentReader` class for the Markdown files using [MarkDig](https://github.com/xoofx/markdig) library. + +## Install the package + +From the command-line: + +```console +dotnet add package Microsoft.Extensions.DataIngestion.Markdig --prerelease +``` + +Or directly in the C# project file: + +```xml + + + +``` + +## Usage Examples + +### Creating a MarkdownReader for Data Ingestion + +```csharp +using Microsoft.Extensions.DataIngestion; + +IngestionDocumentReader reader = new MarkdownReader(); + +using IngestionPipeline pipeline = new(reader, CreateChunker(), CreateWriter()); +``` + +## Feedback & Contributing + +We welcome feedback and contributions in [our GitHub repo](https://github.com/dotnet/extensions). diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion/Microsoft.Extensions.DataIngestion.csproj b/src/Libraries/Microsoft.Extensions.DataIngestion/Microsoft.Extensions.DataIngestion.csproj index 856c8811a02..082677dc4fd 100644 --- a/src/Libraries/Microsoft.Extensions.DataIngestion/Microsoft.Extensions.DataIngestion.csproj +++ b/src/Libraries/Microsoft.Extensions.DataIngestion/Microsoft.Extensions.DataIngestion.csproj @@ -3,14 +3,14 @@ $(TargetFrameworks);netstandard2.0 Microsoft.Extensions.DataIngestion - + Data Ingestion utilities for RAG. + RAG true false - - - false preview false + 75 + 75 diff --git a/src/Libraries/Microsoft.Extensions.DataIngestion/README.md b/src/Libraries/Microsoft.Extensions.DataIngestion/README.md new file mode 100644 index 00000000000..9886465cff6 --- /dev/null +++ b/src/Libraries/Microsoft.Extensions.DataIngestion/README.md @@ -0,0 +1,34 @@ +# Microsoft.Extensions.DataIngestion + +.NET developers need to efficiently process, chunk, and retrieve information from diverse document formats while preserving semantic meaning and structural context. The `Microsoft.Extensions.DataIngestion` libraries provide a unified approach for representing document ingestion components. + +## The packages + +The [Microsoft.Extensions.DataIngestion.Abstractions](https://www.nuget.org/packages/Microsoft.Extensions.DataIngestion.Abstractions) package provides the core exchange types, including [`IngestionDocument`](https://learn.microsoft.com/dotnet/api/microsoft.extensions.dataingestion.ingestiondocument), [`IngestionChunker`](https://learn.microsoft.com/dotnet/api/microsoft.extensions.dataingestion.ingestionchunker-1), [`IngestionChunkProcessor`](https://learn.microsoft.com/dotnet/api/microsoft.extensions.dataingestion.ingestionchunkprocessor-1), and [`IngestionChunkWriter`](https://learn.microsoft.com/dotnet/api/microsoft.extensions.dataingestion.ingestionchunkwriter-1). Any .NET library that provides document processing capabilities can implement these abstractions to enable seamless integration with consuming code. + +The [Microsoft.Extensions.DataIngestion](https://www.nuget.org/packages/Microsoft.Extensions.DataIngestion) package has an implicit dependency on the `Microsoft.Extensions.DataIngestion.Abstractions` package. This package enables you to easily integrate components such as enrichment processors, vector storage writers, and telemetry into your applications using familiar dependency injection and pipeline patterns. For example, it provides the [`SentimentEnricher`](https://learn.microsoft.com/dotnet/api/microsoft.extensions.dataingestion.sentimentenricher), [`KeywordEnricher`](https://learn.microsoft.com/dotnet/api/microsoft.extensions.dataingestion.keywordenricher), and [`SummaryEnricher`](https://learn.microsoft.com/dotnet/api/microsoft.extensions.dataingestion.summaryenricher) processors that can be chained together in ingestion pipelines. + +## Which package to reference + +Libraries that provide implementations of the abstractions typically reference only `Microsoft.Extensions.DataIngestion.Abstractions`. + +To also have access to higher-level utilities for working with document ingestion components, reference the `Microsoft.Extensions.DataIngestion` package instead (which itself references `Microsoft.Extensions.DataIngestion.Abstractions`). Most consuming applications and services should reference the `Microsoft.Extensions.DataIngestion` package along with one or more libraries that provide concrete implementations of the abstractions, such as `Microsoft.Extensions.DataIngestion.MarkItDown` or `Microsoft.Extensions.DataIngestion.Markdig`. + +## Install the package + +From the command-line: + +```console +dotnet add package Microsoft.Extensions.DataIngestion --prerelease +``` +Or directly in the C# project file: + +```xml + + + +``` + +## Feedback & Contributing + +We welcome feedback and contributions in [our GitHub repo](https://github.com/dotnet/extensions).