Skip to content

Add code for running connectors in Agent#2787

Merged
artem-shelkovnikov merged 16 commits intomainfrom
artem/add-connectors-agent
Sep 3, 2024
Merged

Add code for running connectors in Agent#2787
artem-shelkovnikov merged 16 commits intomainfrom
artem/add-connectors-agent

Conversation

@artem-shelkovnikov
Copy link
Member

@artem-shelkovnikov artem-shelkovnikov commented Aug 28, 2024

Part of https://github.com/elastic/search-team/issues/8096

This PR adds a way to run Connectors Service in Agent.

Entry point is connectors/agent/cli.py - if it's ran from an Agent component via the python in .venv/bin/python it will enable Connector Service running within an Agent (docker or not).

This PR adds bare bone logic: positive cases work, so expect tricky bugs to still be out there if we haven't accounted for anything.

I've tried to keep it very small, and I think there's no way to make it simpler/smaller :)

Additionally - I had to exclude tests for the component from make test because it would break make test for people who have no access to the python agent client repository - this one will be opened later so we can revert this part of the changeset. If you need to run tests there is a make target make test-agent - I kept high coverage for future.

Checklists

Pre-Review Checklist

  • this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
  • this PR has a meaningful title
  • this PR links to all relevant github issues that it fixes or partially addresses
  • if there is no GH issue, please create it. Each PR should have a link to an issue
  • this PR has a thorough description
  • Covered the changes with automated tests
  • Tested the changes locally
  • Added a label for each target release version (example: v7.13.2, v7.14.0, v8.0.0)

Copy link
Member

@seanstory seanstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't really read the logic yet (since this is still in draft) but I like the structure/direction. :feelsgood:

jedrazb
jedrazb previously approved these changes Aug 29, 2024
Copy link
Contributor

@jedrazb jedrazb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, couple of nits

)
self.opts = V2Options()
self.buffer = sys.stdin.buffer
self.agent_config = ConnectorsAgentConfiguration()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it took me a while to realise agent_config == (connector)service_config , should we rename to "service_config" or "component_config"? "Agent" references the process running the service component, so it's super accurate name in this context

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll try renaming it, but I'm also not sure about this class - maybe we can even pass agent configuration inside, parse it and emit connectors configuration to just separate this out of the checkin handler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to ConnectorsAgentConfigurationWrapper and changed logic a bit + added code comments. I really could not make good names for stuff that I wrote here, open to suggestions

f"Error while running services in ConnectorServiceManager: {e}"
)
raise
finally:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we care about stopping running services here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should already be stopped - it's not possible to reach here without stopping. I can add additional step of verifying that everything stopped, but not sure it's needed

Copy link
Contributor

@jedrazb jedrazb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to "comment", did you manage to run this with es_agent_client as dependency?

@jedrazb jedrazb dismissed their stale review August 29, 2024 09:11

clicked approve instead of comment

@artem-shelkovnikov
Copy link
Member Author

I meant to "comment", did you manage to run this with es_agent_client as dependency?

Oh yes, this branch is 100% functioning

@artem-shelkovnikov artem-shelkovnikov force-pushed the artem/add-connectors-agent branch from b075733 to 86525f6 Compare August 29, 2024 10:03
@artem-shelkovnikov artem-shelkovnikov force-pushed the artem/add-connectors-agent branch from 60a1a86 to a058c7c Compare August 29, 2024 10:55
@artem-shelkovnikov artem-shelkovnikov changed the title Artem/add connectors agent Add code for running connectors in Agent Aug 29, 2024
@artem-shelkovnikov artem-shelkovnikov marked this pull request as ready for review August 29, 2024 15:26
@artem-shelkovnikov artem-shelkovnikov requested a review from a team August 29, 2024 15:26

test: .venv/bin/pytest .venv/bin/elastic-ingest
.venv/bin/pytest --cov-report term-missing --cov-fail-under 92 --cov-report html --cov=connectors --fail-slow=$(SLOW_TEST_THRESHOLD) -sv tests
.venv/bin/pytest --cov-report term-missing --cov-fail-under 92 --cov-report html --cov=connectors --fail-slow=$(SLOW_TEST_THRESHOLD) -sv tests --ignore tests/agent
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring tests/agent here cause repo that we build package from is private and it'll fail in CI. We can set it up in CI to pull the repo too, but occasional user of main can run into problems and I'd rather keep main clean and functioning

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd challenge that we don't care about 2xx, in operational review discussions we've discussed that bumps in traffic

++

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry what's that quote? :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return configuration


def add_defaults(config):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about method name 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. ;)

@@ -0,0 +1 @@
elastic-agent-client@git+https://github.com/elastic/python-elastic-agent-client@main
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now we just pick main, later we'll have more solid plan, hopefully

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Always releasable" enforced here 🤙

from connectors.agent.component import ConnectorsAgentComponent


class StubMultiService:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stub makes the test code much cleaner IMO so I went for it instead of a mock

jedrazb
jedrazb previously approved these changes Aug 30, 2024
Copy link
Contributor

@jedrazb jedrazb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Couple of questions about error logging and doc improvements suggestion, nothing blocking I think


test: .venv/bin/pytest .venv/bin/elastic-ingest
.venv/bin/pytest --cov-report term-missing --cov-fail-under 92 --cov-report html --cov=connectors --fail-slow=$(SLOW_TEST_THRESHOLD) -sv tests
.venv/bin/pytest --cov-report term-missing --cov-fail-under 92 --cov-report html --cov=connectors --fail-slow=$(SLOW_TEST_THRESHOLD) -sv tests --ignore tests/agent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd challenge that we don't care about 2xx, in operational review discussions we've discussed that bumps in traffic

++

the method returns False.
"""

# TODO: find a good link to what this object is.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to be helpful, so digged into proto def, https://github.com/elastic/elastic-agent-client/blob/main/elastic-agent-client.proto#L168

But it seems like source is a an arbitrary struct hehe (so we have to look somewhere else for the actual definition) and take a leap of faith now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah judging by the protobuf files it's arbitrary:

    // Source is the original configuration of this unit configuration object in the agent policy.
    // Only standard fields are defined as explicit types, additional fields can be parsed from source.
    //
    // This source field will almost always contain arbitrary unit configuration fields beyond those
    // explicitly defined in this message type.
    google.protobuf.Struct source = 1;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I interrogated this during my POC was to just add logging. I'd get a checkin event, deserialize it, re-serialize it as json, then dump it in the logs.

"""
msg = (
f"This connector component can't handle action requests. Received: {action}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we include the error log? I don't see any other "wrapper" logic that would surface this to the user / logs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should log the error when called, right?

cc @seanstory will it terminate the process?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""
if self._running:
msg = f"{self.__class__.__name__} is already running."
raise ServiceAlreadyRunningError(msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add the error log, I'm not sure we catch-and-log the exception anywhere

Copy link
Member Author

@artem-shelkovnikov artem-shelkovnikov Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is not caught, it'll just crash the caller. Supposedly this case is "never gonna happen", so I'd just keep it like this.

@@ -0,0 +1 @@
elastic-agent-client@git+https://github.com/elastic/python-elastic-agent-client@main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Always releasable" enforced here 🤙

seanstory
seanstory previously approved these changes Aug 30, 2024
Copy link
Member

@seanstory seanstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bunch of little nits, but I'm ok with addressing them in follow-ups rather than blocking this.

Awesome stuff. I want to see a demo an make some toasts!

.venv/bin/pytest --cov-report term-missing --cov-fail-under 92 --cov-report html --cov=connectors --fail-slow=$(SLOW_TEST_THRESHOLD) -sv tests
.venv/bin/pytest --cov-report term-missing --cov-fail-under 92 --cov-report html --cov=connectors --fail-slow=$(SLOW_TEST_THRESHOLD) -sv tests --ignore tests/agent

test-agent: .venv/bin/pytest .venv/bin/elastic-ingest install-agent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add this to CI now, to avoid forgetting or letting it get too behind in coverage. I think that ElasticMachine should be able to clone the private repo. Hit me up if it's not working.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't have anything to say here yet, let's just remove the file for now.

from connectors.agent.config import ConnectorsAgentConfigurationWrapper
from connectors.agent.protocol import ConnectorActionHandler, ConnectorCheckinHandler
from connectors.agent.service_manager import ConnectorServiceManager
from connectors.services.base import MultiService
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using the MulitService from Connectors instead of from the elastic_agent_client. Which I think is ok (and probably for the best) but just wanted to flag in case it wasn't intentional. I hadn't tested using ActionsService or CheckinV2Service with the Connectors MultiService.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's technically same code - we need to clean up the python-agent repo from some classes like this or move them to example namespace to not distribute these classes.

self.agent_connectors_config_wrapper = agent_connectors_config_wrapper
self.service_manager = service_manager

async def apply_from_client(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a lot like the "Fake example", but I'm worried we'll need to implement more here. This seems to only look at the first unit's config, but doesn't consider log_level, features, or anything else from the checkin service's sync_component or sync_units method.

return configuration


def add_defaults(config):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. ;)


main()

loop.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing's asserted here. Intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. main blocks so if the test fails the test will be killed by a timeout. I can add some asserts for log messages too, but not sure it's needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just a comment then, with what you've written in this thread. 👍

}

if source.fields.get("api_key"):
es_creds["api_key"] = source["api_key"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@artem-shelkovnikov commenting so that it doesn't get lost, by default we would get Fleet-specific api key format, this should do the trick to turn it to our version:

    if source.fields.get("api_key"):
        api_key = source["api_key"]
        # if beats_logstash_format
        if ":" in api_key:
            api_key = base64.b64encode(api_key.encode()).decode()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified that this works

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the investigation! I'm gonna add this code as well :)

jedrazb
jedrazb previously approved these changes Sep 3, 2024
@artem-shelkovnikov artem-shelkovnikov dismissed stale reviews from jedrazb and seanstory via 428b8ad September 3, 2024 14:30
Copy link
Member

@seanstory seanstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-approve

@artem-shelkovnikov artem-shelkovnikov merged commit fd20dfb into main Sep 3, 2024
@artem-shelkovnikov artem-shelkovnikov deleted the artem/add-connectors-agent branch September 3, 2024 15:18
@github-actions
Copy link

github-actions bot commented Sep 3, 2024

💔 Failed to create backport PR(s)

The backport operation could not be completed due to the following error:
There are no branches to backport to. Aborting.

The backport PRs will be merged automatically after passing CI.

To backport manually run:
backport --pr 2787 --autoMerge --autoMergeMethod squash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants