-
-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: logging deadlock #2346
fix: logging deadlock #2346
Conversation
✅ Deploy Preview for testcontainers-go ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Looks like 95% of tests are in the green, |
b2cc7d7
to
bdd8886
Compare
I've taken a deep look into this error, and as you pointed out, it seems consistent for the reaper-off mode: so there are containers that are not manually cleaned by the tests. And the test that leaks that container is building it from a Dockerfile. So it makes me think the culprit is Could you please take a look? 🙏 |
@mdelapenya @stevenh Not sure if this is helpful, but I ran into this issue (linked above) and debugged it. The multi-consumer aspect didn't have anything to do with it. I could replicate this with a single consumer. I can force the bug about 1/15 times. Here is the flow
|
That seems to be an analysis on the old code, that shouldn't be possible with the new version? |
@stevenh Probably not, I was more or less replying to #2346 (comment). I was also hitting this in CI and wondered what was happening. I think your waitGroup approach should work. If you like, I can commit my reduced test case where I was able to replicate this ( at least with high-probability), then test against that? |
The What's happening is the test cancels the main context causing the logger to stop. This returns context cancelled error which is subsequently returned by There's a few ways to fix:
It feels to me like 4. would be the best solution as it puts the caller in control of the behaviour. A tertiary issue then becomes should all lifecycleHooks iterators e.g. Thoughts? |
If you have a reproduction and want to try with this branch + the attached patch which makes lifecycles and Terminate best effort that would be helpful to confirm it fixes your case too. |
I like the best-effort patch for the lifecycle, which will make the potential errors more evident. For the terminate method, you want to continue with the terminated hook even if we failed the removal of the container, right? |
Yes that makes sense to me, I have a version of this from last night I'll tweak and put up see what folks think. |
Fix logging deadlock, causing lots of test timeouts. This refactors how logging shutdown is handled, eliminating unnecessary captures, use idiomatic wait group to signal processor completion and remove unnecessary nil initialisation. Fix race condition in log testing which was reading Msg while the processor was still running. Switch to checking GITHUB_RUN_ID environment variable to detect GitHub as XDG_RUNTIME_DIR can be present in other situations.
Fix Docker container Terminate returning early when context is cancelled, which was leaving orphaned running containers. This ensures that all life cycle hooks are run even if one errors returning a multi error if needed.
f5edc6b
to
35cb59d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for dedicating time to resolve concurrency issues, that are always hard to detect and fix, even more with legacy code, so I appreciate your efforts here 🙇
} | ||
|
||
// applyLifecycleHooks applies all lifecycle hooks reporting the container logs on error if logError is true. | ||
func (c *DockerContainer) applyLifecycleHooks(ctx context.Context, logError bool, hooks func(lifecycleHooks ContainerLifecycleHooks) []ContainerHook) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
* main: (115 commits) chore: create TLS certs in a consistent manner (testcontainers#2478) chore(deps): bump idna from 3.6 to 3.7 (testcontainers#2480) Elasticsearch disable CA retrieval when ssl is disabled (testcontainers#2475) fix: handle dockerignore exclusions properly (testcontainers#2476) chore: prepare for next minor development cycle (0.31.0) chore: use new version (v0.30.0) in modules and examples Fix url creation to handle query params when using HTTP wait strategy (testcontainers#2466) fix: data race on container run (testcontainers#2345) fix: logging deadlock (testcontainers#2346) feat(k6):Add remote test scripts (testcontainers#2350) feat: optimizes file copies to and from containers (testcontainers#2450) fix(exec): updates the `Multiplexed` opt to combine stdout and stderr (testcontainers#2452) Upgrade neo4j module to use features from v0.29.1 of testcontainers-go (testcontainers#2463) bug:Fix AMQPS url (testcontainers#2462) chore: more compose updates in comments chore: use "docker compose" (v2) instead of "docker-compose" (v1) (testcontainers#2464) chore(deps): bump github/codeql-action from 2.22.12 to 3.24.9 (testcontainers#2459) refactor: Add Weaviate modules tests (testcontainers#2447) feat(exitcode): Add exit code sugar method (testcontainers#2342) feat: add module to support InfluxDB v1.x (testcontainers#1703) ...
What does this PR do?
Fix logging deadlock, causing lots of test timeouts.
This refactors how logging shutdown is handled, eliminating unnecessary captures, use idiomatic wait group to signal processor completion and remove unnecessary
nil
initialisation.Fix race condition in log testing which was reading
Msg
while the processor was still running.Switch to checking
GITHUB_RUN_ID
environment variable to detect GitHub asXDG_RUNTIME_DIR
can be present in other situations.@mdelapenya edit:
Why is it important?
Tests and runtime is currently flakey due to race condition during start and stop of logging.
How to test this PR
Testing is hard as it's fixing a random race condition, but tests should now be more reliable.