-
-
Notifications
You must be signed in to change notification settings - Fork 341
fix(Milvus): Use healthcheck wait strategy #1585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(Milvus): Use healthcheck wait strategy #1585
Conversation
✅ Deploy Preview for testcontainers-dotnet ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughReplaced the HTTP wait strategy with a Docker container healthcheck in MilvusBuilder.Init by adding a private Healthcheck singleton and applying it via WithCreateParameterModifier; adjusted wait chaining. Tests updated to stop pinning an explicit Milvus version and assert image name suffix instead. No public API changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Test as Test
participant Builder as MilvusBuilder
participant Docker as Docker Engine
Test->>Builder: Create container config
Builder->>Builder: Configure wait strategy -> UntilContainerIsHealthy
Builder->>Builder: WithCreateParameterModifier(Healthcheck.Instance)
Builder->>Docker: Create container (with HEALTHCHECK CMD-SHELL curl ...)
Docker-->>Docker: Run healthcheck cycles (interval/timeout/retries)
Docker-->>Test: Report container healthy / unhealthy
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/Testcontainers.Milvus/MilvusBuilder.cs(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: verdie-g
Repo: testcontainers/testcontainers-dotnet PR: 1569
File: src/Testcontainers.Milvus/MilvusBuilder.cs:67-67
Timestamp: 2025-11-10T09:58:21.688Z
Learning: In Milvus test containers (src/Testcontainers.Milvus/MilvusBuilder.cs), the DEPLOY_MODE=STANDALONE environment variable is a valid and official configuration option used in Milvus standalone deployments, as documented in the official Milvus standalone_embed.sh script.
📚 Learning: 2025-11-10T09:58:21.688Z
Learnt from: verdie-g
Repo: testcontainers/testcontainers-dotnet PR: 1569
File: src/Testcontainers.Milvus/MilvusBuilder.cs:67-67
Timestamp: 2025-11-10T09:58:21.688Z
Learning: In Milvus test containers (src/Testcontainers.Milvus/MilvusBuilder.cs), the DEPLOY_MODE=STANDALONE environment variable is a valid and official configuration option used in Milvus standalone deployments, as documented in the official Milvus standalone_embed.sh script.
Applied to files:
src/Testcontainers.Milvus/MilvusBuilder.cs
🧬 Code graph analysis (1)
src/Testcontainers.Milvus/MilvusBuilder.cs (1)
tests/Testcontainers.Tests/Unit/Configurations/TestcontainersWaitStrategyTest.cs (1)
Timeout(42-65)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: ci (Testcontainers, ubuntu-24.04)
- GitHub Check: ci (Testcontainers.Milvus, ubuntu-24.04)
- GitHub Check: analyze (csharp)
🔇 Additional comments (2)
src/Testcontainers.Milvus/MilvusBuilder.cs (2)
62-63: LGTM! Good reference to the official implementation.The inline comment clearly documents the source of the healthcheck logic and provides a specific commit reference for future maintainability.
88-88: LGTM! Correct approach to fix the readiness issue.Switching to
UntilContainerIsHealthy()ensures the container waits for Docker's native healthcheck to pass, which is the appropriate solution for the reported issue where/healthzreturned success before Milvus was fully ready to serve requests.
|
I double-checked the image configuration during my lunch break. I'm honestly wondering how this ever worked reliably. It looks like the image never had a health check configured, and that configuration has been needed from the start. I also double-checked the old version ( |
|
@HofmeisterAn I think the |
|
@verdie-g Thanks for checking it again. I didn't see it in the merge conflict. Now it should be correct. I thought the module already uses the health check to determine readiness. Switching from the HTTP wait strategy to the health check shouldn't behave any differently since they use the same API 🤔. |
Well, yes. I'm a bit puzzled by that. If that helps we can just rename that PR "mirror official milvus script" and I'll check if we still get the same error. |
This would be really helpful. From experience with other modules, relying on log messages is reliable. For Milvus, I did not find any log messages that truly indicate readiness. Maybe this is feedback we can provide to the Milvus project to let them know that their health check API is not reliable? It would be great if you can override the builder's wait strategy and test it: var milvusContainer = new MilvusBuilder()
.WithWaitStrategy(Wait.ForUnixContainer().UntilContainerIsHealthy())
.WithCreateParameterModifier(p =>
{
p.Healthcheck = new HealthcheckConfig
{
Test = ["CMD-SHELL", $"curl -f http://localhost:{MilvusBuilder.MilvusManagementPort}/healthz"],
Interval = TimeSpan.FromSeconds(30),
StartPeriod = 90 * 1_000_000_000L, // 90s
Timeout = TimeSpan.FromSeconds(20),
Retries = 3,
};
}); |
|
Without this change, I used the "Run Unit Test Until Failure" of my IDE and it failed after 2-3 minutes. With the change applied, I was not able to reproduce the problem after 10 minutes of "Run Unit Test Until Failure". So I cannot really explain how that helps but it does somehow :D |
One thing that has changed is the interval. By default, TC runs the wait strategy check every second. Could you try it again with a 1-second interval? We might want to consider reducing the interval, as 30 seconds might be a bit long. |
|
I can confirm that it doesn't take 30 seconds to start but more like 3-4 seconds. With the StartPeriod decreased to 1s, I was not able to reproduce the problem either. |
HofmeisterAn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, tests and feedback 👍.
|
@HofmeisterAn any chance we could have a release for this PR? I would like to use it right away in the Milvus client. |
No ETA. You can patch the default builder configuration, so the release isn't critical. |
Since we upgraded to Milvus 2.6, our tests started to randomly fail with
[service unavailable: internal: Milvus Proxy is not ready yet. please wait. More info here milvus-io/milvus-sdk-csharp#123.My understanding is that even though the request to /healthz succeeds, Milvus might not be completely ready yet.
The fix is to imitate exactly what https://github.com/milvus-io/milvus/blob/master/scripts/standalone_embed.sh does.