-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logstash Crash with ConcurrencyError: interrupted waiting for mutex: null #26
Comments
I don't yet agree fully with the problem description. Your sample code is clear, thank you for that! However, my main concern is that the dns filter is interrupted at all -- what causes this? Some notes from reading the code: seems like The problem I see is that we are using the Some possible solutions I think of:
|
that is why I love discussion 👍, I agree timeout could be the source of getting this thread killed, but problem is more tricky to debug I guess. Even with very small timeout value we did not manage to reproduce this issue with a logstash pipeline. I have a an open question:
I agree your points are fair also as solutions, but have also a few open questions: re: avoid using timeout:
re : rescue
This is basically why I when for the mutex to "protect" the resolve operation to be threadsafe. I think in case we decide to remove the mutex and go for one of your proposals I would think more about What do you think? |
The Resolv library appears to support timeouts for DNS lookups: http://ruby-doc.org/stdlib-2.1.0/libdoc/resolv/rdoc/Resolv/DNS.html#method-i-timeouts-3D |
@jordansissel this is a very good idea, will update my proposal with this. |
I reviewed the code and I wonder how are we with the move to use the 2.0 codebase, I remember some discussion with @andrewvc about this, but not sure if there was a direction for this. I ask because this feature is only available for 2.0, however we could have a patch to change this value in 1.9 and see how it works. My preference would be not to change to 2.0 as we need time to test, but open to proposals for sure. Will update the PR with the patch solution for 1.9 and wait for your input. What do you think? |
@jordansissel updated the PR with a timeout module remove proposal, let me know what do you think. |
We won't be moving to 2.0 (and also jruby 9k) until after 5.0.0 |
@jordansissel then my current proposal at #27 is good as I patched resolv in 1.9 to introduce this timeouts. Let me know how it works for you? makes sense? |
FYI I looked at this a bit in jruby/jruby#4048. The example there does indeed produce the same error, but I'd expect it to produce that error since the native thread is being interrupted while it waits on a mutex. The real problem here is finding why that happens at runtime in Logstash. We need to trap the interrupt somehow. |
Do you have any updates about this issue? |
Same issue happening here, or at least something similar Logstash 2.4.0 Error mesage
|
Does this issue get resolved in any version ??? |
Under a concrete set of circumstances there might be a concurrency problem while processing the /etc/hosts file to setup necessary resolvers. This situation might only happen when the one filter instance is stuck setting up the hosts but this thread and then another instance got kill while waiting, making this tricky so reproduce with a living LS instance.
The condition could be reproduced independently of logstash using https://gist.github.com/purbon/106e80a5d85b3a6fbf1e5f10b5d0d643 code.
The exception reported is:
This problem is independant of LS code, it has been reported in several different environments see [1] and [2] as examples.
Plan of action:
The text was updated successfully, but these errors were encountered: