Skip to content

Conversation

@aleksmaus
Copy link
Contributor

What is the problem this PR solves?

Replaces the holes detections, refresh and refetch for the seq_no based index changes monitoring and queries with the new custom Elasticsearch Fleet APIs that handle refreshes and wait for checkpoints under the hood for us.

How does this PR solve the problem?

  • Switch to the new _fleet/_fleet_msearch and _fleet/_fleet_search Fleet APIs
    endpoints for the searches that required refreshes and wait for
    checkpoints. The new API handles refreshes and checkpoints waits.
  • Separate queues for _msearch and _fleet_msearch, to avoid delays on
    searches without checkpoints wait. Use _fleet/_fleet_msearch endpoint if search is requested with
    wait_for_checkpoints. Use _fleet/_fleet_search for the monitor hits
    fetch.
  • Had to copy over the search and msearch wrappers from go-elasticsearch
    library and customize them for _fleet_search and _fleet_msearch.
    These could be removed once the library is updated for these new
    endpoints.
  • Removed the holes detection and refresh op code as it's not longer
    used.

How to test this PR locally

Full regression testing specifically in the areas that dispatch actions and the policy changes.

Checklist

  • I have commented my code, particularly in hard-to-understand areas

Related issues

Wireshark captures for the new requests

POST /.fleet-actions/_fleet/_fleet_search?wait_for_checkpoints=35 HTTP/1.1
Host: localhost:9200
User-Agent: Elastic-Fleet-Server/8.0.0 (darwin; amd64; a481a70; 2021-10-28 01:15:23 +0000 UTC)
Content-Length: 209
Authorization: Basic ZWxhc3RpYzpjaGFuZ2VtZQ==
Content-Type: application/json
X-Elastic-Client-Meta: es=7.p,go=1.17.2,t=7.p,hc=1.17.2
X-Elastic-Product-Origin: fleet
Accept-Encoding: gzip

{"query":{"bool":{"filter":[{"range":{"_seq_no":{"gt":34}}},{"range":{"_seq_no":{"lte":35}}},{"range":{"expiration":{"gt":"2021-10-28T01:26:00Z"}}}]}},"seq_no_primary_term":true,"size":1000,"sort":["_seq_no"]}

HTTP/1.1 200 OK
X-elastic-product: Elasticsearch
content-type: application/json
content-encoding: gzip
content-length: 454

{"took":100,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":null,"hits":[{"_index":".fleet-actions-7","_id":"C8yAxHwBkUj3cCox_oP4","_seq_no":35,"_primary_term":3,"_score":null,"_source":{"action_id":"9b10c039-766c-4e7a-b226-144b0346e74f","@timestamp":"2021-10-28T01:26:00.695Z","expiration":"2021-10-28T01:31:00.695Z","type":"INPUT_ACTION","input_type":"osquery","agents":["6a339113-6722-43b1-a5f8-3e95fee0d09d"],"user_id":"elastic","data":{"id":"1da0e280-c0f3-4e1f-9f06-8597c1fd31bb","query":"select * from users limit 3 ","ecs_mapping":{}}},"sort":[35]}]}}   

POST /_fleet/_fleet_msearch HTTP/1.1
Host: localhost:9200
User-Agent: Elastic-Fleet-Server/8.0.0 (darwin; amd64; a481a70; 2021-10-28 01:15:23 +0000 UTC)
Content-Length: 363
Authorization: Basic ZWxhc3RpYzpjaGFuZ2VtZQ==
Content-Type: application/json
X-Elastic-Client-Meta: es=7.p,go=1.17.2,t=7.p,hc=1.17.2
X-Elastic-Product-Origin: fleet
Accept-Encoding: gzip

{"index": ".fleet-actions", "wait_for_checkpoints": [35]}
{"_source":{"excludes":["agents"]},"query":{"bool":{"filter":[{"range":{"_seq_no":{"gt":35}}},{"range":{"_seq_no":{"lte":35}}},{"range":{"expiration":{"gt":"2021-10-28T01:26:03Z"}}},{"terms":{"agents":["6a339113-6722-43b1-a5f8-3e95fee0d09d"]}}]}},"seq_no_primary_term":true,"size":100,"sort":["_seq_no"]}

HTTP/1.1 200 OK
X-elastic-product: Elasticsearch
content-type: application/json
content-encoding: gzip
content-length: 158

{"took":1,"responses":[{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":0,"relation":"eq"},"max_score":null,"hits":[]},"status":200}]}   

…ticsearch Fleet APIs, remove holes detection and refreshes

* Switch to the new _fleet/_fleet_msearch and _fleet/_fleet_search Fleet APIs
  endpoints for the searches that required refreshes and wait for
  checkpoints. The new API handles refreshes and checkpoints waits.
* Separate queues for _msearch and _fleet_msearch, to avoid delays on
  searches without checkpoints wait. Use _fleet/_fleet_msearch endpoint if search is requested with
  wait_for_checkpoints. Use _fleet/_fleet_search for the monitor hits
  fetch.
* Had to copy over the search and msearch wrappers from go-elasticsearch
  library and customize them for _fleet_search and _fleet_msearch.
  These could be removed once the library is updated for these new
  endpoints.
* Removed the holes detection and refresh op code as it's not longer
  used.
@aleksmaus aleksmaus added enhancement New feature or request Team:Elastic-Agent Label for the Agent team Team:Fleet Label for the Fleet team backport-v7.16.0 Automated backport with mergify labels Oct 28, 2021
@mergify
Copy link
Contributor

mergify bot commented Oct 28, 2021

This pull request does not have a backport label. Could you fix it @aleksmaus? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • v/d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Oct 28, 2021
@elasticmachine
Copy link
Contributor

elasticmachine commented Oct 28, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-10-29T17:59:10.006+0000

  • Duration: 9 min 21 sec

  • Commit: da23522

Test stats 🧪

Test Results
Failed 0
Passed 226
Skipped 0
Total 226

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@aleksmaus aleksmaus removed backport-skip Skip notification from the automated backport with mergify backport-v7.16.0 Automated backport with mergify labels Oct 28, 2021
@mergify
Copy link
Contributor

mergify bot commented Oct 28, 2021

This pull request does not have a backport label. Could you fix it @aleksmaus? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • v/d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Oct 28, 2021
Copy link

@scunningham scunningham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less code than before. Nice!

return err
}

buf.WriteString("{")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added extra blit for the normal cases where there's only one index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with additional param in there it's less branches to cover for balancing the brackets, it's easier to understand this way. let me know if still want to optimize this blit away.

buf.WriteString("{ }\n")
} else {
buf.WriteString(`{"index": "`)
buf.WriteString(`"index": "`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed logic for the case where index = "". Should fail on empty index name then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

if len(checkpoints) > 0 {
buf.WriteString(`, "wait_for_checkpoints": `)
// Write array as string, example: [1,2,3]
buf.WriteString(strings.ReplaceAll(fmt.Sprint([]int64(checkpoints)), " ", ","))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't happen enough to optimize, but curious if there's a faster way than two passes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simple loop is faster

BenchmarkTwoPhases/0-16  	 9670161	       122.5 ns/op	      26 B/op	       2 allocs/op
BenchmarkTwoPhases/1-16  	 5927617	       203.5 ns/op	      40 B/op	       3 allocs/op
BenchmarkTwoPhases/2-16  	 3636922	       331.2 ns/op	      56 B/op	       5 allocs/op
BenchmarkTwoPhases/3-16  	 2711920	       442.8 ns/op	      64 B/op	       6 allocs/op
BenchmarkTwoPhases/4-16  	 2291353	       566.9 ns/op	      88 B/op	       7 allocs/op
BenchmarkTwoPhases/5-16  	 1854615	       684.2 ns/op	      96 B/op	       8 allocs/op
BenchmarkTwoPhases/6-16  	 1544910	       702.6 ns/op	     104 B/op	       9 allocs/op
BenchmarkTwoPhases/7-16  	 1469547	       822.2 ns/op	     112 B/op	      10 allocs/op
BenchmarkTwoPhases/8-16  	 1314674	       905.5 ns/op	     136 B/op	      11 allocs/op
BenchmarkTwoPhases/9-16  	 1000000	      1019 ns/op	     144 B/op	      12 allocs/op
BenchmarkLoop/0-16       	758645307	         1.548 ns/op	       0 B/op	       0 allocs/op
BenchmarkLoop/1-16       	51346444	        23.28 ns/op	       4 B/op	       1 allocs/op
BenchmarkLoop/2-16       	41733303	        28.96 ns/op	       8 B/op	       1 allocs/op
BenchmarkLoop/3-16       	35340536	        34.36 ns/op	       8 B/op	       1 allocs/op
BenchmarkLoop/4-16       	26870481	        45.16 ns/op	      16 B/op	       1 allocs/op
BenchmarkLoop/5-16       	24264476	        51.31 ns/op	      16 B/op	       1 allocs/op
BenchmarkLoop/6-16       	21013568	        56.16 ns/op	      16 B/op	       1 allocs/op
BenchmarkLoop/7-16       	18831969	        64.75 ns/op	      16 B/op	       1 allocs/op
BenchmarkLoop/8-16       	15635251	        72.55 ns/op	      24 B/op	       1 allocs/op
BenchmarkLoop/9-16       	14547670	        78.62 ns/op	      24 B/op	       1 allocs/op

will update

)

if queue.ty == kQueueFleetSearch {
req := es.FleetMsearchRequest{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment here about it being temporary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment

@ruflin
Copy link
Collaborator

ruflin commented Oct 29, 2021

Nice to see this made it into Elasticsearch. Its a pity that currently we need to have all the elasticsearch client code in here. Could you follow with the client team and open an issue to get this added to the client so we can remove it later again?

On the ES PR I see a 7.16 label but not sure if this made it into 7.16? If yes, should we also backport it?

@scunningham
Copy link

Nice to see this made it into Elasticsearch. Its a pity that currently we need to have all the elasticsearch client code in here. Could you follow with the client team and open an issue to get this added to the client so we can remove it later again?

On the ES PR I see a 7.16 label but not sure if this made it into 7.16? If yes, should we also backport it?

I advised against back porting it because we are past feature freeze, and this only adds risk without providing a ton of upside.

As for the Elasticsearch code, this is, in theory, temporary. This logic should role into the msearch API once it gets past "experimental". I think we should defer adding it to the Elastic Client until we are convinced that it will not make it into the mainstream for whatever reason.

@aleksmaus
Copy link
Contributor Author

As for the Elasticsearch code, this is, in theory, temporary. This logic should role into the msearch API once it gets past "experimental". I think we should defer adding it to the Elastic Client until we are convinced that it will not make it into the mainstream for whatever reason.

I agree with @scunningham on this. Since the API is experimental, customized for fleet, and we possibly will keep iterating on this, the client library will be always be behind. So we will end up switching between using the client library and removing the copied over code and copying over the code again to customize it for the latest changes in the API.
Will keep and eye on client library and pick up the updates when it makes sense.

@aleksmaus aleksmaus merged commit a2fb073 into elastic:master Oct 29, 2021
@ruflin ruflin added v8.0.0 and removed backport-skip Skip notification from the automated backport with mergify labels Nov 11, 2021
mergify bot pushed a commit that referenced this pull request Nov 11, 2021
…ticsearch Fleet APIs, remove holes detection and refreshes (#814)

* Switch to the new _fleet/_fleet_search and _fleet/_fleet_msearch Elasticsearch Fleet APIs, remove holes detection and refreshes

* Switch to the new _fleet/_fleet_msearch and _fleet/_fleet_search Fleet APIs
  endpoints for the searches that required refreshes and wait for
  checkpoints. The new API handles refreshes and checkpoints waits.
* Separate queues for _msearch and _fleet_msearch, to avoid delays on
  searches without checkpoints wait. Use _fleet/_fleet_msearch endpoint if search is requested with
  wait_for_checkpoints. Use _fleet/_fleet_search for the monitor hits
  fetch.
* Had to copy over the search and msearch wrappers from go-elasticsearch
  library and customize them for _fleet_search and _fleet_msearch.
  These could be removed once the library is updated for these new
  endpoints.
* Removed the holes detection and refresh op code as it's not longer
  used.

(cherry picked from commit a2fb073)
@ruflin
Copy link
Collaborator

ruflin commented Nov 11, 2021

@aleksmaus This change is not in 8.0 but only master. Is my assumption correct that this should also be in the 8.0 branch?

Here is the automatic backport, please merge if you agree: #863

v1v pushed a commit that referenced this pull request Nov 11, 2021
…ticsearch Fleet APIs, remove holes detection and refreshes (#814)

* Switch to the new _fleet/_fleet_search and _fleet/_fleet_msearch Elasticsearch Fleet APIs, remove holes detection and refreshes

* Switch to the new _fleet/_fleet_msearch and _fleet/_fleet_search Fleet APIs
  endpoints for the searches that required refreshes and wait for
  checkpoints. The new API handles refreshes and checkpoints waits.
* Separate queues for _msearch and _fleet_msearch, to avoid delays on
  searches without checkpoints wait. Use _fleet/_fleet_msearch endpoint if search is requested with
  wait_for_checkpoints. Use _fleet/_fleet_search for the monitor hits
  fetch.
* Had to copy over the search and msearch wrappers from go-elasticsearch
  library and customize them for _fleet_search and _fleet_msearch.
  These could be removed once the library is updated for these new
  endpoints.
* Removed the holes detection and refresh op code as it's not longer
  used.

(cherry picked from commit a2fb073)
aleksmaus added a commit that referenced this pull request Nov 11, 2021
…ticsearch Fleet APIs, remove holes detection and refreshes (#814) (#863)

* Switch to the new _fleet/_fleet_search and _fleet/_fleet_msearch Elasticsearch Fleet APIs, remove holes detection and refreshes

* Switch to the new _fleet/_fleet_msearch and _fleet/_fleet_search Fleet APIs
  endpoints for the searches that required refreshes and wait for
  checkpoints. The new API handles refreshes and checkpoints waits.
* Separate queues for _msearch and _fleet_msearch, to avoid delays on
  searches without checkpoints wait. Use _fleet/_fleet_msearch endpoint if search is requested with
  wait_for_checkpoints. Use _fleet/_fleet_search for the monitor hits
  fetch.
* Had to copy over the search and msearch wrappers from go-elasticsearch
  library and customize them for _fleet_search and _fleet_msearch.
  These could be removed once the library is updated for these new
  endpoints.
* Removed the holes detection and refresh op code as it's not longer
  used.

(cherry picked from commit a2fb073)

Co-authored-by: Aleksandr Maus <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Team:Elastic-Agent Label for the Agent team Team:Fleet Label for the Fleet team v8.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants