Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 2, 2025

Fix Flaky Vector Store Integration Tests - Race Condition

  • Add helper method WaitForVectorStoreReadyAsync to OpenAI integration tests
  • Fix CreateAIAgentAsync_WithHostedFileSearchTool_SearchesFilesAsync in OpenAIAssistantClientExtensionsTests
  • Add helper method WaitForVectorStoreReadyAsync to Azure AI Persistent integration tests
  • Fix CreateAgent_CreatesAgentWithVectorStoresAsync in AzureAIAgentsPersistentCreateTests
  • Build and test the changes to verify fixes work correctly
  • Address code review feedback - use proper VectorStoreStatus struct comparison
  • Remove unused imports
  • Remove excessive comments per C# Unit Test Guidelines
  • Remove unnecessary status variable in Azure test
Original prompt

This section details on the original issue you should resolve

<issue_title>.NET: [BUG] Flaky Integration Test: CreateAIAgentAsync_WithHostedFileSearchTool_SearchesFilesAsync - Vector Store Indexing Race Condition</issue_title>
<issue_description>## Summary
The integration test CreateAIAgentAsync_WithHostedFileSearchTool_SearchesFilesAsync in OpenAIAssistantClientExtensionsTests is flaky and intermittently fails due to a race condition with OpenAI's vector store indexing process.

Test Failure

Assert.Contains() Failure: Sub-string not found
String:    "I couldn't find any documented code for '"···
Not found: "673457"

Root Cause
The test creates a vector store with file content and immediately queries the AI agent:

// Line 146-152: Create vector store and add file
var vectorStoreCreate = await vectorStoreClient.CreateVectorStoreAsync(options: new VectorStoreCreationOptions()
{
    Name = "WordCodeLookup_VectorStore",
    FileIds = { uploadedFileId }
});
string vectorStoreId = vectorStoreCreate.Value.Id;

// Lines 177-179: IMMEDIATELY query (no wait for indexing!)
var response = await agent.RunAsync("Can you give me the documented code for 'banana'?");
var text = response.ToString();
Assert.Contains("673457", text);  // ❌ FAILS - vector store not ready yet!

The Problem: Vector stores require time to index files before they're searchable. The test doesn't wait for indexing completion, creating a race condition.
Evidence of Race Condition
The agent responds with:
"I couldn't find any documented code for '"

This indicates the agent attempted file search but the vector store wasn't fully indexed yet, so no results were found.
Similar Pattern Found
The same pattern exists in AzureAIAgentsPersistentCreateTests.CreateAgent_CreatesAgentWithVectorStoresAsync (line 68-127), which also creates a vector store and immediately queries without waiting for indexing:

// tests/AzureAIAgentsPersistent.IntegrationTests/AzureAIAgentsPersistentCreateTests.cs
var vectorStoreMetadata = await this._persistentAgentsClient.VectorStores.CreateVectorStoreAsync([uploadedAgentFile.Id], name: "WordCodeLookup_VectorStore");

var agent = /* ... create agent with vector store ... */;

// IMMEDIATE query without waiting
var result = await agent.RunAsync("Can you give me the documented code for 'banana'?");
Assert.Contains("673457", result.ToString());  // Potentially flaky!

Impact

• ❌ CI/CD pipelines fail intermittently which makes the PR pipeline a bottleneck...
• ❌ False test failures waste developer time...
• ❌ Difficult to reproduce locally (timing-dependent)
• ❌ Reduces confidence in test suite
• ❌ Uses valuable compute and Developers time


Proposed Solutions
Solution 1: Wait for Vector Store Indexing Status (Recommended)
Add polling logic to wait until the vector store status is Completed:

// After creating the vector store
var vectorStoreCreate = await vectorStoreClient.CreateVectorStoreAsync(options: new VectorStoreCreationOptions()
{
    Name = "WordCodeLookup_VectorStore",
    FileIds = { uploadedFileId }
});
string vectorStoreId = vectorStoreCreate.Value.Id;

// ✅ NEW: Wait for indexing to complete
await WaitForVectorStoreReadyAsync(vectorStoreClient, vectorStoreId);

// Helper method
private static async Task WaitForVectorStoreReadyAsync(
    VectorStoreClient client, 
    string vectorStoreId,
    int maxWaitSeconds = 30)
{
    var sw = Stopwatch.StartNew();
    while (sw.Elapsed.TotalSeconds < maxWaitSeconds)
    {
        var vectorStore = await client.GetVectorStoreAsync(vectorStoreId);
        if (vectorStore.Value.Status == VectorStoreStatus.Completed)
        {
            return;  // ✅ Ready!
        }
        
        if (vectorStore.Value.Status == VectorStoreStatus.Failed)
        {
            throw new InvalidOperationException($"Vector store indexing failed");
        }
        
        await Task.Delay(1000);  // Poll every second
    }
    
    throw new TimeoutException($"Vector store did not complete indexing within {maxWaitSeconds}s");
}

Affected Tests

  1. ✅ Confirmed Flaky:
    • OpenAIAssistantClientExtensionsTests.CreateAIAgentAsync_WithHostedFileSearchTool_SearchesFilesAsync
    • Location: tests/OpenAIAssistant.IntegrationTests/OpenAIAssistantClientExtensionsTests.cs:129-189
  2. 🔍 Potentially Flaky (Same Pattern):
    • AzureAIAgentsPersistentCreateTests.CreateAgent_CreatesAgentWithVectorStoresAsync
    • Location: tests/AzureAIAgentsPersistent.IntegrationTests/AzureAIAgentsPersistentCreateTests.cs:68-127

Recommendation

Implement Solution 1 (wait for vector store status) for both tests:

  1. ✅ Most reliable - no guessing on timing
  2. ✅ Self-documenting - makes async nature explicit
  3. ✅ Fails fast if indexing actually fails
  4. ✅ Works across different environments (local, CI, slow networks)

Related Information
• Test Run Log: GitHub Actions run showing failure
• OpenAI API Docs: Vector store indexing is asynchronous
• Similar Issues: Azure AI services...


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Fix flaky integration test for vector store indexing Fix flaky vector store integration tests - race condition on indexing Dec 2, 2025
Copilot AI requested a review from rogerbarreto December 2, 2025 17:15
@github-actions github-actions bot changed the title Fix flaky vector store integration tests - race condition on indexing .NET: Fix flaky vector store integration tests - race condition on indexing Dec 3, 2025
@rogerbarreto rogerbarreto marked this pull request as ready for review December 3, 2025 12:20
Copilot AI review requested due to automatic review settings December 3, 2025 12:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a race condition in two flaky integration tests that were failing intermittently because they queried vector stores immediately after creation, before asynchronous file indexing completed. The fix introduces polling logic to wait for indexing completion before proceeding with agent queries.

Key Changes:

  • Added WaitForVectorStoreReadyAsync helper methods that poll vector store status every second (with 30s timeout)
  • Implemented status validation checking for Completed status, zero failed files, and handling Expired status
  • Applied the fix to both OpenAI and Azure AI Agents integration tests

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
dotnet/tests/OpenAIAssistant.IntegrationTests/OpenAIAssistantClientExtensionsTests.cs Added System.Diagnostics import, WaitForVectorStoreReadyAsync static helper method, and call to wait for vector store indexing before agent queries
dotnet/tests/AzureAIAgentsPersistent.IntegrationTests/AzureAIAgentsPersistentCreateTests.cs Added System.Diagnostics import, WaitForVectorStoreReadyAsync instance helper method, and call to wait for vector store indexing before agent queries

@rogerbarreto
Copy link
Member

@copilot According to the C# Unit Test Guidelines, we should "avoid adding excessive comments to tests" and "favour clear easy to understand code." The comment "// Poll every second" is obvious from the code await Task.Delay(1000). This comment should be removed.

@copilot

Copy link
Contributor Author

Copilot AI commented Dec 3, 2025

@copilot According to the C# Unit Test Guidelines, we should "avoid adding excessive comments to tests" and "favour clear easy to understand code." The comment "// Poll every second" is obvious from t...

All excessive comments removed in commit 45501c3.

@rogerbarreto rogerbarreto added this pull request to the merge queue Dec 3, 2025
Merged via the queue into main with commit 6835161 Dec 3, 2025
14 checks passed
@crickman crickman deleted the copilot/fix-flaky-integration-test branch December 4, 2025 17:16
This was referenced Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

.NET: [BUG] Flaky Integration Test: CreateAIAgentAsync_WithHostedFileSearchTool_SearchesFilesAsync - Vector Store Indexing Race Condition

5 participants