Skip to content

Conversation

@joegallo
Copy link
Contributor

@joegallo joegallo commented Dec 4, 2025

We haven't touched this in quite some time (it's been since #47374). But there's actually a new feature in joni that we should be taking advantage of: jruby/joni#78 (this PR doesn't do that, though, it's just a version bump).

Interestingly, that previous Elasticsearch PR was also a version bump in order to get access to better timeout logic.

From previous profiling and benchmarking, I know that the grok processor is one of the more time-consuming processors out there, and that while executing the regex is timeconsuming, registering and unregistering the watchdog is surprisingly heavy compared to how long one might imagine it could take (I'm betting due to synchronization overhead), so moving to the native solution without a separate watchdog mechanism would likely buy us a nice little bit of free performance here.

jruby/joni#75 also looks interesting, but I don't know off the top of my head if that's quite as a relevant to our use of joni in Elasticsearch.

@joegallo joegallo requested a review from masseyke December 4, 2025 16:16
@joegallo joegallo requested a review from a team as a code owner December 4, 2025 16:16
@joegallo joegallo added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team v9.3.0 labels Dec 4, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @joegallo, I've created a changelog YAML for you.

Copy link
Member

@masseyke masseyke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@masseyke
Copy link
Member

masseyke commented Dec 4, 2025

It would be useful to attach some kind of performance comparison though.

@joegallo
Copy link
Contributor Author

joegallo commented Dec 5, 2025

Top Processors by Type:
=======================
                       count  time_in_millis  time_in_nanos  percent
processor
grok                50174960          616200          12282    38.7%
grok                50174960          633588          12628    39.4%
grok                50174960          583334          11626    38.6%

Here's an amalgamation of three runs -- the first row is stock Elasticsearch, the second row is slightly slower and that's with the joni upgrade (this might be within the bounds of our confidence interval on these things, 🤷), the third row is with the joni upgrade and I took a quick WIP swing at pulling out the the MatcherWatchdog and replacing it with the new built-in timeout feature from joni. The only values you can compare across the runs are the time_in_millis and time_in_nanos (per document), the percentages are meaningless in this output.

@joegallo joegallo merged commit ff9e709 into elastic:main Dec 5, 2025
35 checks passed
@joegallo joegallo deleted the bump-joni branch December 5, 2025 12:44
@joegallo
Copy link
Contributor Author

joegallo commented Dec 5, 2025

The purple part is the watchdog part of a Grok#match, and that's what will drop out once we switch to the native timeout solution:

Screenshot 2025-12-05 at 8 39 53 AM

mamazzol pushed a commit to mamazzol/elasticsearch that referenced this pull request Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants