Use getOrDefault in IngestDocument rather than containsKey+get #120571
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is half a code tidiness change (it irks me every time I see in these code paths that we're doing this), and half a performance improvement (it irks me that we're doing it because it is slower).
Consider that basically every ingest processor gets a value from the document, does something to that value, and then sets the value somewhere in the document. On the get side and on the set side, we're paying a map lookup cost twice that we could just be paying once.
In the case of the map in question being a
CtxMap(which it pretty much always is for the first key of a path traversal), we were also paying the cost of callingmetadata.isAvailable(str)twice, too.Note that this same trick was already applied to to
Metadataitself in a previous PR (specifically #93333).Anyway, this doesn't make an enormous different on any one processor or pipeline, but it speeds everything up just a little. For example, microbenchmarking on my machine, it speeds up an example
renameprocessor by 20%, but that's precisely because a rename processor is so fast that the cost of the wasted map lookups actually matters. On a real workload this is probably a modest one or two percent improvement in overall ingest processing time, but that would depend on the workload, of course.Here's some screenshots: