-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fluentd stuck/hangs because of infinity regexp (99.9%) please improve detection/validation #2464
Comments
Same time, Logstash has no issues consuming my 2 log files with the same regex |
Hmm... to debug this, we need your actual log for the investigation. |
Cool! I'll send you privately ([email protected]) when I got approve from my PM/TL |
What about SIGCONT sig during stuck? Any thoughts why it's always works but not in my situation? |
This is log content / regexp combo issue or your regexp hits the bug of ruby's regexp engine. |
@repeatedly look, Logstash have GROK. GROK have timeouts on each regex matching operation which can be tuned. Logstash also log this unmatched/timeouted lines to separate log-file for the next investigation and next updating regex which is fails during processing. Without this ability you can't develop logs parsing matches, right? I mean, how can I understand which string from my multi format log made Fluentd stuck? I just suggest to make improvements and add timeouts & logging for those users who is using regex patterns! Also, the problem is with hanging Fluentd with infinity regex loops, this improvement which I suggest, can solve all this problems with hanging/stuck processing input records! |
I was talking about this feature: https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html#plugins-filters-grok-timeout_millis also please take a look on this features:
They are so helpful! Thanks! |
grok is the collection of regexp so I assume logstash also has similar regexp problem. |
Signed-off-by: Masahiro Nakagawa <[email protected]>
Signed-off-by: Masahiro Nakagawa <[email protected]>
Signed-off-by: Masahiro Nakagawa <[email protected]>
Wrote patch for this issue: #2513 |
Signed-off-by: Masahiro Nakagawa <[email protected]>
parser: Add timeout parameter to avoid parsing stuck. fix #2464
First of all I'd like to thanks to all developers who made this piece of software works at least 23 hours in production without any issues. My issue is happened after 24 hours.
Describe the bug
I'm using 'multi_format' plugin inside 'tail' source, which is made by Mr @repeatedly
My "parse".."/parse" code inside 'tail' plugin looks like this structure, I have 10-15 various expressions to catch my multi-format logs:
Every day I'm having production stuck/hang fluentd process which consumes 100% cpu
But it's not just about consuming high CPU, fluentd just stops consuming/tailing my logs same time, which is horrible for my 200 production bare-metal servers, you know.
I have only 2 log files at the same time, just 2 files which I want this software to consume:
When Fluentd hang/stuck I am not able to get report by sending SIGCONT signal to PID
Strace shows me same data even if I strace well-working fluentd or stuked:
But pstack shows me interesting information every time even if I stop/start fluentd:
look 0 and 1 lines
Which make me think, Fluentd has no ability to detect infinity regexp loops, which is basically handled by https://rubular.com/ for example.
To Reproduce
I can privately send you my config and log with unknown log-string which makes fluentd hangs
Expected behavior
Warn to logfile and continue processing logs.
Warning must include entire log-string like you do with unmatched strings.
There should be ability to match this logs and apply another regexp (i.e. retry to process more simpler regexp)
Your Environment
# rpm -qa | grep td-agent td-agent-3.4.1-0.el7.x86_64
And tested with 1.4 and 1.5 Fluentd (and other versions till 1.0) - same issue
I don't know why old version ruby 2.4.6 is using in production, when its EOL:
# /opt/td-agent/embedded/bin/ruby -v ruby 2.4.6p354 (2019-04-01 revision 67394) [x86_64-linux]
Operating system:
# cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)"
Kernel version:
# uname -r 4.17.8-1.el7.elrepo.x86_64
Your Configuration
Your Error Log
The text was updated successfully, but these errors were encountered: