Skip to content

Conversation

@michalpristas
Copy link
Contributor

What is the problem this PR solves?

What this PR solves is a problem when agent got unenrolled on heavier load when agent managing fleet server cannot checkin to it's own server so it will fallback to unenroll.
Closes #741

How does this PR solve the problem?

Problem is solved by adding internal endpoint which is used for communication on local network (with agent handling fleet server)
It lets FS to spin up 2 set of handlers, one on public 8220 and one on port defined in config.

How to test this PR locally

This needs to be tested with work on elastic-agent Link: elastic/beats#28993

  • Start stack
  • Install agent with FS in a policy
  • Check ports
sh-3.2# lsof -i -P | grep LISTEN | grep fleet
fleet-ser  7056            root   19u  IPv4 0xba7881a9227099a5      0t0    TCP localhost:{random_port} (LISTEN)
fleet-ser  7056            root   21u  IPv6 0xba7881a91284721d      0t0    TCP *:8220 (LISTEN)
  • run wireshark, set filter to random port, there should be some comm
  • set filter to 8220 port, there should be no comm
  • enroll new agent, from another VM
  • there should be some comm on both ports

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

apmmachine and others added 30 commits November 3, 2021 05:28
(cherry picked from commit 8a4855b)

Co-authored-by: Sean Cunningham <[email protected]>
This was coming out of the debugging session around fleet-server where some of the log messages were not too clear to me on what these mean.
…ticsearch Fleet APIs, remove holes detection and refreshes (elastic#814) (elastic#863)

* Switch to the new _fleet/_fleet_search and _fleet/_fleet_msearch Elasticsearch Fleet APIs, remove holes detection and refreshes

* Switch to the new _fleet/_fleet_msearch and _fleet/_fleet_search Fleet APIs
  endpoints for the searches that required refreshes and wait for
  checkpoints. The new API handles refreshes and checkpoints waits.
* Separate queues for _msearch and _fleet_msearch, to avoid delays on
  searches without checkpoints wait. Use _fleet/_fleet_msearch endpoint if search is requested with
  wait_for_checkpoints. Use _fleet/_fleet_search for the monitor hits
  fetch.
* Had to copy over the search and msearch wrappers from go-elasticsearch
  library and customize them for _fleet_search and _fleet_msearch.
  These could be removed once the library is updated for these new
  endpoints.
* Removed the holes detection and refresh op code as it's not longer
  used.

(cherry picked from commit a2fb073)

Co-authored-by: Aleksandr Maus <[email protected]>
…elastic#864)

* Do not depend on agent.Id ad that field was not added until 7.15

(cherry picked from commit 6382114)

* Migrate agent.id field from 7.14 to 7.15+

(cherry picked from commit aeb4b66)

* Handle 404 on .fleet-agent index as a noop during migration.

(cherry picked from commit 130056a)

Co-authored-by: Sean Cunningham <[email protected]>
* Periodic expired actions cleanup

* Fix make check

* Fix TestConfig unit test

* Put back WithRefresh in integration tests actions setup

* Switch the actions cleanup to use bulker.MDelete instead of Delete

* Improve 404 status handling

(cherry picked from commit 6694c08)

Co-authored-by: Aleksandr Maus <[email protected]>
* use ecs zerolog lib for logging

(cherry picked from commit 6627876)

* update checksums

(cherry picked from commit 998db6a)

* run check on 1.17

(cherry picked from commit 72dccaa)

Co-authored-by: bryan <[email protected]>
* Add default_api_key_history field to the agent schema

* Append agent.default_api_key_history on API key change and invalidate the keys on ack

(cherry picked from commit dff3595)

Co-authored-by: Aleksandr Maus <[email protected]>
Adds support to enable instrumentation via the APM Go agent. New config
options have been added to the `Server` input which could be set up in
the `fleet-server` integration configuration.

The added instrumentation covers the `fleet-server` http server and the
Adds support to enable instrumentation via the APM Go agent. New config
options have been added to the `Server` input which could be set up in
the `fleet-server` integration configuration.

The added instrumentation covers the `fleet-server` http server and the
`go-elasticsearch` client.

A sample of the configuration that's been added (`instrumentation`):

```yaml
inputs:
  - type: fleet-server
    server:
      instrumentation:
        enabled: true
        hosts: ["localhost:8200"]
        environment: production
        secret_token: token
        api_key: apikey
```

Signed-off-by: Marc Lopez Rubio <[email protected]>
(cherry picked from commit ade74c7)

Co-authored-by: Marc Lopez Rubio <[email protected]>
…c#906) (elastic#910)

* Improve expired actions cleanup, use _delete_by_query instead

(cherry picked from commit fae23a3)

Co-authored-by: Aleksandr Maus <[email protected]>
)

* keep trucking on ES availability errors; more tests to come

(cherry picked from commit 7fb0138)

* don't attempt to distinguish between errors, just keep retrying

(cherry picked from commit 2c75552)

* move error blackholing up the stack so the monitor will never crash, added additional logging

(cherry picked from commit f5fead9)

* pr feedback

(cherry picked from commit 1886dc5)

* upped logging level, properly wrapped errors

(cherry picked from commit 97524dc)

Co-authored-by: bryan <[email protected]>
apmmachine and others added 5 commits December 2, 2021 05:26
…ic#964)

Adds TLS configuration options for the APM instrumentation, using env
vars to configure the APM HTTP Tracer since it currently doesn't support
setting those values in Golang. We'll follow up on this once the apm
tracer has a function to create a new tracer with configurable settings
via config struct.

Signed-off-by: Marc Lopez Rubio <[email protected]>
(cherry picked from commit 155d0e9)

Co-authored-by: Marc Lopez Rubio <[email protected]>
@michalpristas michalpristas self-assigned this Dec 7, 2021
@mergify
Copy link
Contributor

mergify bot commented Dec 7, 2021

This pull request is now in conflicts. Could you fix it @michalpristas? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b backport_multiple_endpoints-8.0 upstream/backport_multiple_endpoints-8.0
git merge upstream/master
git push upstream backport_multiple_endpoints-8.0

@mergify
Copy link
Contributor

mergify bot commented Dec 7, 2021

This pull request does not have a backport label. Could you fix it @michalpristas? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v/d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-skip Skip notification from the automated backport with mergify

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fleet server is unexpectedly unenrolled under load

2 participants