-
Notifications
You must be signed in to change notification settings - Fork 3.5k
filter/date: Reject invalid UNIX timestamp #1253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lib/logstash/filters/date.rb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it's possible that data be a Numeric here. We are dealing with strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, is /\d+/ === date too relax? it would match on "a1b" or "a1s2d3" etc, maybe /^\d+$/ ?
|
Thanks for you contribution. See inline comments. As noted, this would also solve issue #1236 |
|
@colinsurprenant I applied you proposal for the code format. |
|
ok, thanks, will review & test locally shortly. |
|
@colinsurprenant I have rebased this PR against master, hoping I did it correctly. Also it seems that the spec file is not fully consistent on how it calls and test the timestamp value like Two kind of access to timestamp Missing .time in non-equals test it would need a 2nd-pass on the complete file, a little out of scope of this PR |
lib/logstash/filters/date.rb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wiibaa Can you make this non capturing group? Like /^\d+(?:\.\d+)?$/ Will give it some performance gain
|
@suyograo done |
|
Can one of the admins verify this patch? |
|
LGTM |
|
I'll check, but using non capturing groups was pretty quick. Also we could flip the order of |
|
@colinsurprenant I duplicated logic in spec/filters/date_performance to get events/sec ratio for UNIX and UNIX_MS parsers with both kind of entry string vs numeric, and executed a few time locally to get an average value for each possibilities. here are the results: With validation the main reduction in speed is when parsing string, circa -1400evt/sec Is it sufficient for you or do you use different solution internally for benchmarking ? |
|
@colinsurprenant @suyograo I though more about this, the initial issue is that String#to_i String#to_f never fails but return 0 when the string is not a valid number, an alternative to regex would be to use Integer() and Float() that internally raise an exception the raised exception would look like: so except a small loss for string parsing with UNIX pattern the other cases validation is for free |
|
I'm a bit confused, isn't the isNumeric approach the fastest? Also, you might want to look at https://github.com/elasticsearch/logstash/tree/master/test/integration for performing benchmarks with real config and real data. |
|
@colinsurprenant in fact I discovered a better approach, according to my testing, after this PR was merged, explaining the monologue after this PR has been closed... |


When using the date filter with UNIX or UNIX_MS format, if an invalid value is given like a not-resolved sprintf %{mytimestamp} , it is evaluated to zero (thanks String#to_i) and thus reset @timestamp to EPOCH time.
Root cause of #1236 and also mentionned in LOGSTASH-1597
Bugfix + spec!