Skip to content

kbeaugrand/SemanticKernel.Agents.Memory

Repository files navigation

Build Status CodeQL Code Quality License

Issues Pull Requests

SemanticKernel Memory pipeline for Agent Frameworks

This repository contains an advanced Memory Pipeline designed to enhance the context management and information retrieval capabilities of Semantic Kernel agents. The pipeline integrates various memory storage and retrieval strategies to enable agents to maintain, update, memory effectively across interactions.

Features

  • Advanced Text Chunking: Supports both simple size-based chunking and intelligent semantic chunking based on document structure
  • Semantic Chunking: Creates meaningful chunks by detecting document headings and structure (Markdown, underlined, numbered headings)
  • Dependency Injection: Full integration with Microsoft.Extensions.DependencyInjection for easy configuration and testing
  • Fluent Configuration API: Intuitive configuration syntax
  • Modular memory components supporting vector stores, databases, and custom memory handlers
  • Efficient embedding and semantic search integration for context-aware retrieval
  • Easy integration with Semantic Kernel SDK and extensible architecture for custom memory logic

Installation

Install Packages

Then install the packages you need:

# Core memory pipeline functionality
dotnet add package SemanticKernel.Agents.Memory.Core

Alternatively, you can add the packages directly to your .csproj file:

<PackageReference Include="SemanticKernel.Agents.Memory.Core" />
<PackageReference Include="SemanticKernel.Agents.Memory.Abstractions" />

Getting Started

Demo setup

The demo code in samples/SemanticKernel.Agents.Memory.Samples/PipelineDemo.cs registers configuration and composes the pipeline using the fluent API. The example below shows how to configure the pipeline:

services.AddAzureOpenAITextEmbeddingGeneration(
    deploymentName: "NAME_OF_YOUR_DEPLOYMENT", // Name of deployment, e.g. "text-embedding-ada-002".
    endpoint: "YOUR_AZURE_ENDPOINT",           // Name of Azure OpenAI service endpoint, e.g. https://myaiservice.openai.azure.com.
    apiKey: "YOUR_API_KEY",
    modelId: "MODEL_ID",          // Optional name of the underlying model if the deployment name doesn't match the model name, e.g. text-embedding-ada-002.
    serviceId: "YOUR_SERVICE_ID", // Optional; for targeting specific services within Semantic Kernel.
    dimensions: 1536              // Optional number of dimensions to generate embeddings with.
);

services.AddAzureOpenAIChatCompletion(
    deploymentName: "NAME_OF_YOUR_DEPLOYMENT",
    apiKey: "YOUR_API_KEY",
    endpoint: "YOUR_AZURE_ENDPOINT",
    modelId: "gpt-4", // Optional name of the underlying model if the deployment name doesn't match the model name
    serviceId: "YOUR_SERVICE_ID" // Optional; for targeting specific services within Semantic Kernel
);

var memoryStore = new InMemoryVectorStore(); // or your vector store implementation

// Configure the memory ingestion pipeline using the fluent API
services.ConfigureMemoryIngestion(options =>
{
    options
        // Use MarkitDown extraction service running locally
        .WithMarkitDownTextExtraction("http://localhost:5000")
        // Semantic (structure-aware) chunking with lambda configuration
        .WithSemanticChunking(() => new SemanticChunkingOptions
        {
            MaxChunkSize = 500,         // Max characters per chunk
            MinChunkSize = 100,         // Minimum characters per chunk for structure-aware splitting
            TitleLevelThreshold = 3,    // Consider headings up to this level as titles
            IncludeTitleContext = true, // Include heading/title text in chunk context
            TextOverlap = 50            // Overlapping characters between adjacent chunks
        })
        // Handler that generates embeddings (uses configured Azure OpenAI or mock generator)
        .WithDefaultEmbeddingsGeneration()
        // Save records using an in-memory vector store instance (samples use this for demos)
        .WithSaveRecords(memoryStore);
});

services.AddMemorySearchClient(memoryStore, new SearchClientOptions
{
    MaxMatchesCount = 10,        // Max search results to retrieve
    AnswerTokens = 300,          // Max tokens for AI-generated answers
    Temperature = 0.7,           // LLM creativity (0.0 = deterministic, 1.0 = creative)
    MinRelevance = 0.6           // Minimum relevance score for results
});
...

// Get the orchestrator from the service provider
var orchestrator = serviceProvider.GetRequiredService<ImportOrchestrator>();

// Get the search client from the service provider
ISearchClient searchClient = serviceProvider.GetRequiredService<ISearchClient>();

// Create a file upload request using the fluent builder API
_ = await orchestrator.ProcessUploadAsync(index: "default",
    orchestrator.NewDocumentUpload()
        .WithFile("path/to/document.pdf")
        .WithTag("document-type", "technical")
        .WithTag("priority", "high")
        .WithContext("source", "user-upload")
        .Build());

var searchResult = await searchClient.SearchAsync(
    index: "default",
    query: query,
    minRelevance: 0.7,
    limit: 5
);

Console.WriteLine($"Found {searchResult.Results.Count} results for: {query}");

foreach (var result in searchResult.Results)
{
    Console.WriteLine($"• {result.Source} (Score: {result.RelevanceScore:F3})");
    Console.WriteLine($"  Content: {result.Content.Substring(0, Math.Min(150, result.ContentLength))}...");
}

Running the Sample

To see the memory pipeline in action, run the sample application:

cd samples/SemanticKernel.Agents.Memory.Samples
dotnet run

The sample application demonstrates several demos (see PipelineDemo.cs):

  • Basic pipeline demo using simple, size-based chunking (RunAsync)
  • Semantic chunking demo that uses document structure (RunSemanticChunkingAsync)
  • Custom handler / services demo showing how to register additional services (RunCustomHandlerAsync)
  • Semantic chunking configuration demo with fine-grained options (RunSemanticChunkingConfigDemo)

Running the MarkitDown extraction service

The samples call a small helper service (MarkitDown) to extract and preprocess documents. You can run it either directly with Python or via Docker. The service listens on port 5000 by default and the samples use the URL http://localhost:5000.

Run with Python (recommended for development):

python3 -m venv .venv
source .venv/bin/activate
pip install -r services/markitdown-service/requirements.txt
python services/markitdown-service/app.py

Run with Docker:

docker build -t markitdown-service services/markitdown-service
docker run --rm -p 5000:5000 markitdown-service

License

This project is licensed under the MIT License - see the LICENSE.txt file for details.

About

Memory Pipeline for Semantic Kernel Agents

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •