Skip to content

Conversation

@joegallo
Copy link
Contributor

This is half a code tidiness change (it irks me every time I see in these code paths that we're doing this), and half a performance improvement (it irks me that we're doing it because it is slower).

Consider that basically every ingest processor gets a value from the document, does something to that value, and then sets the value somewhere in the document. On the get side and on the set side, we're paying a map lookup cost twice that we could just be paying once.

In the case of the map in question being a CtxMap (which it pretty much always is for the first key of a path traversal), we were also paying the cost of calling metadata.isAvailable(str) twice, too.

Note that this same trick was already applied to to Metadata itself in a previous PR (specifically #93333).

Anyway, this doesn't make an enormous different on any one processor or pipeline, but it speeds everything up just a little. For example, microbenchmarking on my machine, it speeds up an example rename processor by 20%, but that's precisely because a rename processor is so fast that the cost of the wasted map lookups actually matters. On a real workload this is probably a modest one or two percent improvement in overall ingest processing time, but that would depend on the workload, of course.

Here's some screenshots:

Screenshot 2025-01-21 at 9 48 25 PM Screenshot 2025-01-21 at 9 48 46 PM

@joegallo joegallo added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >refactoring Team:Data Management Meta label for data/management team v9.0.0 v8.18.0 labels Jan 22, 2025
@joegallo joegallo requested a review from nielsbauman January 22, 2025 03:32
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Copy link
Contributor

@nielsbauman nielsbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@joegallo joegallo added the auto-backport Automatically create backport pull requests when merged label Jan 24, 2025
@joegallo joegallo merged commit 83dd34f into elastic:main Jan 24, 2025
16 checks passed
@joegallo joegallo deleted the ingest-document-get-or-default branch January 24, 2025 16:39
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >refactoring Team:Data Management Meta label for data/management team v8.18.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants