Skip to content

Conversation

@nchaulet
Copy link
Member

@nchaulet nchaulet commented Aug 25, 2020

Description

Currently fetching unacknowledged actions for an agent is a performance bottleneck because of parsing KQL (see #75646)

This PR change our data model to avoid that KQL query.

Change made in this PR:

  • Denormalize our model and a new property not_acknowledged_actions on the agent so we do not need to do a search to find not acknowledged action for an agent.
  • rename sent_at => acknowledged_at in AgentAction schema as it's more accurate.

Load test

2000 agents

Before

2020/08/25 10:35:17 timer requests.healthcheck.latency
2020/08/25 10:35:17   count:            1052
2020/08/25 10:35:17   min:                46.28ms
2020/08/25 10:35:17   max:              2816.12ms
2020/08/25 10:35:17   mean:              297.97ms
2020/08/25 10:35:17   stddev:            320.69ms
2020/08/25 10:35:17   median:            217.33ms
2020/08/25 10:35:17   75%:               280.85ms
2020/08/25 10:35:17   95%:               723.66ms
2020/08/25 10:35:17   99%:              2140.05ms
2020/08/25 10:35:17   99.9%:            2801.34ms
2020/08/25 10:35:17   1-min rate:          1.29
2020/08/25 10:35:17   5-min rate:          1.28
2020/08/25 10:35:17   15-min rate:         1.32
2020/08/25 10:35:17   mean rate:           1.25
2020/08/25 10:35:17 counter requests.healthcheck.concurrent_count
2020/08/25 10:35:17   count:               1
2020/08/25 10:35:17 meter requests.healthcheck.success
2020/08/25 10:35:17   count:            1052
2020/08/25 10:35:17   1-min rate:          1.29
2020/08/25 10:35:17   5-min rate:          1.28
2020/08/25 10:35:17   15-min rate:         1.32
2020/08/25 10:35:17   mean rate:           1.25
2020/08/25 10:35:17 Policy revision summary
2020/08/25 10:35:17   revision  1:   2000 agents

After

2020/08/25 10:06:44 timer requests.healthcheck.latency
2020/08/25 10:06:44   count:            1061
2020/08/25 10:06:44   min:                44.09ms
2020/08/25 10:06:44   max:              2202.55ms
2020/08/25 10:06:44   mean:              254.06ms
2020/08/25 10:06:44   stddev:            273.90ms
2020/08/25 10:06:44   median:            205.27ms
2020/08/25 10:06:44   75%:               232.60ms
2020/08/25 10:06:44   95%:               500.66ms
2020/08/25 10:06:44   99%:              1689.15ms
2020/08/25 10:06:44   99.9%:            2197.51ms
2020/08/25 10:06:44   1-min rate:          1.32
2020/08/25 10:06:44   5-min rate:          1.32
2020/08/25 10:06:44   15-min rate:         1.35
2020/08/25 10:06:44   mean rate:           1.33
2020/08/25 10:06:44 counter requests.healthcheck.concurrent_count
2020/08/25 10:06:44   count:               0
2020/08/25 10:06:44 meter requests.healthcheck.success
2020/08/25 10:06:44   count:            1061
2020/08/25 10:06:44   1-min rate:          1.32
2020/08/25 10:06:44   5-min rate:          1.32
2020/08/25 10:06:44   15-min rate:         1.35
2020/08/25 10:06:44   mean rate:           1.33
2020/08/25 10:06:44 Policy revision summary
2020/08/25 10:06:44   revision  1:   2000 agents

4000 agents

Before

2020/08/25 11:06:34 counter requests.healthcheck.concurrent_count
2020/08/25 11:06:34   count:               1
2020/08/25 11:06:34 meter requests.healthcheck.success
2020/08/25 11:06:34   count:            1575
2020/08/25 11:06:34   1-min rate:          1.02
2020/08/25 11:06:34   5-min rate:          0.96
2020/08/25 11:06:34   15-min rate:         1.03
2020/08/25 11:06:34   mean rate:           0.96
2020/08/25 11:06:34 timer requests.healthcheck.latency
2020/08/25 11:06:34   count:            1575
2020/08/25 11:06:34   min:                49.46ms
2020/08/25 11:06:34   max:              4281.60ms
2020/08/25 11:06:34   mean:              562.76ms
2020/08/25 11:06:34   stddev:            800.22ms
2020/08/25 11:06:34   median:            313.69ms
2020/08/25 11:06:34   75%:               477.01ms
2020/08/25 11:06:34   95%:              3045.95ms
2020/08/25 11:06:34   99%:              3669.76ms
2020/08/25 11:06:34   99.9%:            4279.77ms
2020/08/25 11:06:34   1-min rate:          1.02
2020/08/25 11:06:34   5-min rate:          0.96
2020/08/25 11:06:34   15-min rate:         1.03
2020/08/25 11:06:34   mean rate:           0.96
2020/08/25 11:06:34 Agent rollout
2020/08/25 11:06:34   agents:  4000
2020/08/25 11:06:34 Policy revision summary
2020/08/25 11:06:34   revision  1:   4000 agents

After

2020/08/25 11:42:32 meter requests.healthcheck.success
2020/08/25 11:42:32   count:            1831
2020/08/25 11:42:32   1-min rate:          1.16
2020/08/25 11:42:32   5-min rate:          1.10
2020/08/25 11:42:32   15-min rate:         1.06
2020/08/25 11:42:32   mean rate:           1.07
2020/08/25 11:42:32 counter requests.healthcheck.concurrent_count
2020/08/25 11:42:32   count:               1
2020/08/25 11:42:32 timer requests.healthcheck.latency
2020/08/25 11:42:32   count:            1831
2020/08/25 11:42:32   min:                44.90ms
2020/08/25 11:42:32   max:              3568.79ms
2020/08/25 11:42:32   mean:              430.63ms
2020/08/25 11:42:32   stddev:            665.81ms
2020/08/25 11:42:32   median:            229.46ms
2020/08/25 11:42:32   75%:               310.05ms
2020/08/25 11:42:32   95%:              2580.73ms
2020/08/25 11:42:32   99%:              3182.45ms
2020/08/25 11:42:32   99.9%:            3566.68ms
2020/08/25 11:42:32   1-min rate:          1.16
2020/08/25 11:42:32   5-min rate:          1.10
2020/08/25 11:42:32   15-min rate:         1.06
2020/08/25 11:42:32   mean rate:           1.07
2020/08/25 11:42:32 Agent rollout
2020/08/25 11:42:32   agents:  4000
2020/08/25 11:42:32 Policy revision summary
2020/08/25 11:42:32   revision  1:   4000 agents

TODO

  • Add SO migrations for sent_at field

@nchaulet nchaulet added v8.0.0 release_note:skip Skip the PR/issue when compiling release notes v7.10.0 Team:Fleet Team label for Observability Data Collection Fleet team labels Aug 25, 2020
@nchaulet nchaulet self-assigned this Aug 25, 2020
@nchaulet nchaulet marked this pull request as ready for review August 25, 2020 15:57
@nchaulet nchaulet requested a review from a team August 25, 2020 15:57
@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

@jen-huang
Copy link
Contributor

How will this affect agent actions created in 7.9? I see saved object mapping changes. Does there need to be migrations added if there are documents from 7.9?

@kibanamachine
Copy link
Contributor

⏳ Build in-progress, with failures

Failed CI Steps

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@nchaulet
Copy link
Member Author

@jen-huang it will affect actions from 7.9, I am planning on writing the migration, but we should have an happy path to migrate the renaming of sent_at to acknowledged_at

@nchaulet
Copy link
Member Author

nchaulet commented Sep 3, 2020

Better fix #75693 #76589

@nchaulet nchaulet closed this Sep 3, 2020
@nchaulet nchaulet deleted the feature-refacto-action-sent-at branch September 3, 2020 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v7.10.0 v8.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants