Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chat template defaulting to GitHub Models means ingestion is impractical beyond 'hello world' case #6136

Open
SteveSandersonMS opened this issue Mar 17, 2025 · 2 comments
Labels
area-ai-templates Microsoft.Extensions.AI.Templates

Comments

@SteveSandersonMS
Copy link
Member

SteveSandersonMS commented Mar 17, 2025

I was trying out ingesting data beyond the two sample PDFs and quickly ran into rate limit issues:

Unhandled exception. System.ClientModel.ClientResultException: HTTP 429 (: RateLimitReached)

Rate limit of 20 per 60s exceeded for UserByModelByMinute. Please wait 0 seconds before retrying.
   at OpenAI.ClientPipelineExtensions.ProcessMessageAsync(ClientPipeline pipeline, PipelineMessage message, RequestOptions options)

This rate limit is unfortunate. More info about it:

Image

This leads to several problems with the existing ingestion mechanism:

  1. It runs out of requests very quickly. Currently it calls the generator once per source document, which means you couldn't ingest more than 150 docs/day. In any case, you'd hit per-minute rate limits much sooner than that (e.g., eShopSupport tries to ingest 200 docs, and would hit per-minute limits every 15 docs)
  2. We don't handle rate limit errors. They just cause ingestion to fail.
  3. We don't manage the number of input tokens. We just send all the chunks in an entire source document, even if the total is over 64k tokens, which would also fail.

What we have is OK for "hello world" cases (2 small PDFs) but doesn't address the question of scaling beyond that. It's good that the template is labelled "preview" because this is quite a basic issue we need to resolve.

Possible strategies:

  • We could do in-proc embedding generation (e.g., via Onnx runtime) instead of calling an external embedding generator service
  • We could reconsider the whole approach around ingestion to make it less frameworky and more clearly just one special case for a "getting started" template, making it more obvious that developers are responsible for figuring out their own approach to ingestion based on their needs
    • ... or we could go the other way and actually try to make something like a minimal ingestion framework that handles all the edge cases
@SteveSandersonMS SteveSandersonMS added untriaged area-ai-templates Microsoft.Extensions.AI.Templates labels Mar 17, 2025
@jeffhandley
Copy link
Member

@SteveSandersonMS I wanted to confirm that this is the topic you referred to during our standup on Monday. Is that right?

@EdCharbeneau
Copy link

EdCharbeneau commented Mar 20, 2025

Hitting the same issue.

Not sure this helps the team, but it may be helpful to users who are trying this out and hitting limits and simply want to continue using the Preview without this blocking them.

I have switched the embedding client to locally hosted while keeping the chat client as a GitHub model.
WARNING: Changing to a local model means regenerating all of your embeddings. You may need to delete the cache ingestioncache.db

The end result is something like this:

// Embedding client
builder.AddOllamaApiClient("local-embedding").AddEmbeddingGenerator();

// Use your existing chat client for inference
var credential = new ApiKeyCredential(builder.Configuration["Chat:GitHubModels:Token"] ?? throw new InvalidOperationException("Missing configuration: GitHubModels:Token. See the README for details."));
			var openAIOptions = new OpenAIClientOptions()
			{
				Endpoint = new Uri("https://models.inference.ai.azure.com")
			};

			var innerClient = new OpenAIClient(credential, openAIOptions);
		
			var client = innerClient.AsChatClient("gpt-4o-mini");
			builder.Services.AddChatClient(client).UseFunctionInvocation().UseLogging();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-ai-templates Microsoft.Extensions.AI.Templates
Projects
None yet
Development

No branches or pull requests

3 participants