Skip to content

Conversation

@wiibaa
Copy link
Contributor

@wiibaa wiibaa commented Apr 11, 2014

When using the date filter with UNIX or UNIX_MS format, if an invalid value is given like a not-resolved sprintf %{mytimestamp} , it is evaluated to zero (thanks String#to_i) and thus reset @timestamp to EPOCH time.

Root cause of #1236 and also mentionned in LOGSTASH-1597

Bugfix + spec!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it's possible that data be a Numeric here. We are dealing with strings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, is /\d+/ === date too relax? it would match on "a1b" or "a1s2d3" etc, maybe /^\d+$/ ?

@colinsurprenant colinsurprenant added this to the v1.4.2 milestone May 10, 2014
@colinsurprenant
Copy link
Contributor

Thanks for you contribution. See inline comments. As noted, this would also solve issue #1236

@wiibaa
Copy link
Contributor Author

wiibaa commented May 11, 2014

@colinsurprenant I applied you proposal for the code format.
However I kept the numeric check as it was present in the existing spec, I suppose it can come from a grok conversion or a mutate or even a json input.
If everything was really only string it would be too simple ;)

@colinsurprenant colinsurprenant self-assigned this May 15, 2014
@colinsurprenant
Copy link
Contributor

ok, thanks, will review & test locally shortly.

@jordansissel jordansissel modified the milestones: 1.4.3, v1.4.2 Jun 17, 2014
@wiibaa
Copy link
Contributor Author

wiibaa commented Jun 23, 2014

@colinsurprenant I have rebased this PR against master, hoping I did it correctly.

Also it seems that the spec file is not fully consistent on how it calls and test the timestamp value like

Two kind of access to timestamp

insist { subject.timestamp.time } ==
insist { subject["@timestamp"].time } ==

Missing .time in non-equals test

insist { subject["@timestamp"] } !=

it would need a 2nd-pass on the complete file, a little out of scope of this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wiibaa Can you make this non capturing group? Like /^\d+(?:\.\d+)?$/ Will give it some performance gain

@wiibaa
Copy link
Contributor Author

wiibaa commented Jul 30, 2014

@suyograo done

@elasticsearch-release
Copy link

Can one of the admins verify this patch?

@suyograo
Copy link
Contributor

LGTM

@suyograo suyograo closed this in 0f79064 Jul 30, 2014
@wiibaa wiibaa deleted the date-unix branch July 31, 2014 05:28
@jsvd jsvd added the v1.x label Sep 9, 2014
@colinsurprenant
Copy link
Contributor

@suyograo @wiibaa did we run a benchmark before/after to see the impact of doing a regex check at every parse?

@jsvd jsvd modified the milestones: v1.5.0, 1.4.3 Sep 9, 2014
@suyograo
Copy link
Contributor

suyograo commented Sep 9, 2014

I'll check, but using non capturing groups was pretty quick. Also we could flip the order of unless /^\d+(?:\.\d+)?$/ === date || date.is_a?(Numeric)

@wiibaa
Copy link
Contributor Author

wiibaa commented Sep 10, 2014

@colinsurprenant I duplicated logic in spec/filters/date_performance to get events/sec ratio for UNIX and UNIX_MS parsers with both kind of entry string vs numeric, and executed a few time locally to get an average value for each possibilities. here are the results:

With validation the main reduction in speed is when parsing string, circa -1400evt/sec
With @suyograo proposal to flip the conditions, we mitigate the validation cost for integer value but it increases a little more the string validation cost.

Is it sufficient for you or do you use different solution internally for benchmarking ?

image

@wiibaa
Copy link
Contributor Author

wiibaa commented Sep 11, 2014

@colinsurprenant @suyograo I though more about this, the initial issue is that String#to_i String#to_f never fails but return 0 when the string is not a valid number, an alternative to regex would be to use Integer() and Float() that internally raise an exception

when "UNIX" # unix epoch
  parsers << lambda do |date|
    (Float(date) * 1000).to_i
  end
when "UNIX_MS" # unix epoch in ms
  parsers << lambda do |date|
    Integer(date)
  end

the raised exception would look like:

Failed parsing date from field {:field=>"mydate", :value=>"%{bad_value}", :exception=>#<ArgumentError: invalid value for Float(): "%{bad_value}">,

and the mini-benchmark
image

so except a small loss for string parsing with UNIX pattern the other cases validation is for free
What do you think?

@colinsurprenant
Copy link
Contributor

I'm a bit confused, isn't the isNumeric approach the fastest?

Also, you might want to look at https://github.com/elasticsearch/logstash/tree/master/test/integration for performing benchmarks with real config and real data.

@wiibaa
Copy link
Contributor Author

wiibaa commented Nov 14, 2014

@colinsurprenant in fact I discovered a better approach, according to my testing, after this PR was merged, explaining the monologue after this PR has been closed...
I will try to use your benchmarks and if relevant do a PR direclty on the date filter repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants