Skip to content

Add template_id sort to patterned_text track#825

Merged
parkertimmins merged 2 commits intomasterfrom
parker/add-patterned-text-template-id-sort
Jul 25, 2025
Merged

Add template_id sort to patterned_text track#825
parkertimmins merged 2 commits intomasterfrom
parker/add-patterned-text-template-id-sort

Conversation

@parkertimmins
Copy link
Contributor

No description provided.

@parkertimmins parkertimmins requested a review from kkrik-es July 22, 2025 23:02
@parkertimmins
Copy link
Contributor Author

@kkrik-es I need to do some systematic testing of different ways to sort, but this seems like a good first bet.

{% endif %}
{% if patterned_text_message_field | default(false) is true %}
"sort": {
"field": [ "host.name", "message.template_id", "@timestamp" ],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should have template_id first. It's fine to submit it as is, but you can also test with template_id first and see if it's better.

@parkertimmins
Copy link
Contributor Author

I ran elastic/logs with the sort config added in this PR (host.name, message.template_id, timestamp), as well as with "message.template_id,host.name,timestamp". Here are the results:

template_id,hostname,timestamp sort config:
    total: 29.89gb
    message: 4.18gb
    template_id: .05gb
    template: .02 gb
    args:  1.13gb
    host.name: .18gb

hostname,template,timestamp sort config:
    total: 28.98gb
    message: 4.28gb
    message.template_id: 0.14gb
    message.template: .07gb
    message.args: 1.16gb
    host.name: .05gb

I was expecting the first config to get better compression, but this was not the case. Each of the message fields do get slightly higher compression, but this was not enough to make up for other fields. For example, host.name was over 100mb worse, and presumably many correlated fields were worse as well.

There are some good gains to be made by improving how message.args is encoded. For example, encoding timestamps directly will provide large gains. But the biggest gains are still to be made by reducing the size of the message inverted index.

@parkertimmins parkertimmins merged commit e140dad into master Jul 25, 2025
13 checks passed
jordan-powers added a commit that referenced this pull request Oct 30, 2025
Use the mapping parameter added in elastic/elasticsearch#136571 to sort on
message.template_id instead of manually specifying with index.sort.fields.

Relates to #825.
@esbenchmachine esbenchmachine added the backport pending Awaiting backport to stable release branch label Dec 19, 2025
@esbenchmachine
Copy link
Collaborator

@parkertimmins
A backport is pending for this PR. Please add all required vX.Y version labels.

  • If it is intended for the current Elasticsearch release version, apply the corresponding version label.
  • If it also supports past released versions, add those labels too.
  • If it only targets a future version, wait until that version label exists and then add it.
    (Each rally-tracks version label is created during the feature freeze of a new Elasticsearch branch).

Backporting entails:

  1. Ensure the correct version labels exist in this PR.
  2. Ensure backport PRs have backport label and are passing tests.
  3. Merge backport PRs (you can approve yourself and enable auto-merge).
  4. Remove backport pending label from this PR once all backport PRs are merged.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport pending Awaiting backport to stable release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments