Skip to content

Add support for deploy service agent auto updates#31982

Merged
bernardjkim merged 26 commits intomasterfrom
bernard/ecs-auto-update
Oct 10, 2023
Merged

Add support for deploy service agent auto updates#31982
bernardjkim merged 26 commits intomasterfrom
bernard/ecs-auto-update

Conversation

@bernardjkim
Copy link
Copy Markdown
Contributor

@bernardjkim bernardjkim commented Sep 15, 2023

Contributes to #28780, https://github.com/gravitational/cloud/issues/4880

This PR adds support for auto updates of Deploy Service Agents. More info in RFD.
Some changes from the RFD:

  • The RFD described the upgrade process by first calling ecs:CreateService and then ecs:DeleteService, to update a service, but the implementation just uses ecs:UpdateService to update the service.

Comment thread lib/integrations/awsoidc/deployservice_update.go Outdated
Comment thread lib/automaticupgrades/config.go
Comment thread lib/integrations/awsoidc/deployservice_update.go Outdated
Comment thread lib/integrations/awsoidc/deployservice_update.go Outdated
Comment thread lib/automaticupgrades/version.go Outdated
Comment thread lib/service/awsoidc.go Outdated
Copy link
Copy Markdown
Collaborator

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First quick pass, I'll do another more in-depth one later.

Comment thread lib/automaticupgrades/version.go Outdated
Comment thread lib/integrations/awsoidc/deployservice_update.go Outdated
Comment thread lib/service/awsoidc.go Outdated
Comment thread lib/auth/integration/integrationv1/awsoidc.go Outdated
Comment thread lib/automaticupgrades/version.go Outdated
Comment thread lib/automaticupgrades/version.go Outdated
Comment thread lib/service/awsoidc.go Outdated
@smallinsky smallinsky self-requested a review September 20, 2023 13:19
"ecs:DescribeClusters", "ecs:CreateCluster", "ecs:PutClusterCapacityProviders",
"ecs:DescribeServices", "ecs:CreateService", "ecs:UpdateService",
"ecs:RegisterTaskDefinition",
"ecs:RegisterTaskDefinition", "ecs:DescribeTaskDefinition", "ecs:DeregisterTaskDefinition",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to think of some way to let the users know that their current set of permissions might not be enough, and that they must re-run the script.

Maybe a warning in the logs?
Cluster Alert might also work

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doing anything here?

For new deployments, it should be good (we add all the permissions).
For current deployments, it will fail.
Do we have any kind of alerting?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will log the following message if permissions are lacking, but I don't know how effective that will be. https://github.com/gravitational/teleport/pull/31982/files#diff-b2b691cbb3f300d6396f5b1ff26ac2569a944859ac8041067d8c97b23037f9c3R321-R325

I'm not familiar with Cluster Alerts, do you think this would be a better approach?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log messages are stored internally (as in Teleport Cloud infra).
They will not be visible to the customer.

Maybe we can get away with looking for those logs ourselves and letting the affected customers know.
But honestly, that feels hacky, error-prone and not scalable.

With a ClusterAlert the customer would know.
I'm not very familiar with those as well, but I believe it would work better (even if we do this, let's keep the warning as well).
@r0mant What do you think?

As for method to fix:

Re-run deploy service configuration script to update permissions.

We don't have an easy way for the user to re-run the script.
It is only generated at that step when enrolling an RDS database.

Can we generate the script URL?
It should look something like this:

curl 'https://<tenant>.teleport.sh/webapi/scripts/integrations/configure/deployservice-iam.sh?integrationName=<integration-name>&awsRegion=<aws-region>&taskRole=<taskRole>&role=<integrationRole>'

I think all the variables are present in the current context.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a cluster alert here fec081d

Do you know of a way to get the task role in the current context? The only way I'm aware of is extracting the task role from the task definition. But that would require that the instance already has permissions to describe the task definition.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think we can.
I thought it was under Service, but it is actually under Task Definition 😭

In that case, we can only ask them to fill it with the same role that was used previously.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay the alert might look something like the following:

Open Amazon CloudShell and copy/paste the following command to reconfigure integration. Replace TASK_ROLE with deploy service task role name. bash -c "$(curl 'https://bernard-dev.cloud.gravitational.io/webapi/scripts/integrations/configure/deployservice-iam.sh?awsRegion=us-west-2&integrationName=database-access&role=bernard-dev-database-access&taskRole=TASK_ROLE')"

It is pretty verbose tho. We might want to consider adding a docs page later and linking that instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave cluster alerts out of scope for now and revisit at a later time.

Comment thread lib/integrations/awsoidc/deployservice_update.go Outdated
Comment thread lib/service/service.go Outdated
Comment thread lib/service/awsoidc.go Outdated
@marcoandredinis marcoandredinis self-requested a review September 25, 2023 11:26
Copy link
Copy Markdown
Collaborator

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overall approach makes sense but I left a few suggestions about code structure/organization.

Comment thread lib/automaticupgrades/version.go
Comment thread lib/integrations/awsoidc/deployservice.go
Comment thread lib/integrations/awsoidc/deployservice_update.go Outdated
Comment thread lib/service/awsoidc.go Outdated
Comment thread lib/service/awsoidc.go Outdated
Comment thread lib/service/awsoidc.go Outdated
Comment thread lib/service/awsoidc.go Outdated
Comment thread lib/integrations/awsoidc/deployservice_update.go Outdated
Comment thread lib/integrations/awsoidc/deployservice_update.go
Comment thread lib/integrations/awsoidc/deployservice_update.go Outdated
Copy link
Copy Markdown
Contributor

@marcoandredinis marcoandredinis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested this (very basic test) and it worked as expected 👍

image

Left some comments

Comment thread lib/integrations/awsoidc/deployservice_update.go Outdated
Comment thread lib/integrations/awsoidc/deployservice_update.go Outdated
Comment thread lib/service/awsoidc.go Outdated
Comment thread lib/service/awsoidc.go Outdated
Comment thread lib/service/awsoidc.go Outdated
- Perform updates in parallel
- Add additional logging
- Add additional documentation
Comment thread lib/service/awsoidc.go Outdated
Comment thread lib/integrations/awsoidc/deployservice_update.go
Comment thread lib/integrations/awsoidc/deployservice_update.go Outdated
Comment thread lib/integrations/awsoidc/deployservice_update.go Outdated
Comment thread lib/service/awsoidc.go Outdated
Comment thread lib/service/awsoidc.go Outdated
Comment thread lib/service/awsoidc.go Outdated
Comment thread lib/service/awsoidc.go Outdated
@bernardjkim bernardjkim requested a review from r0mant October 5, 2023 00:18
@bernardjkim bernardjkim enabled auto-merge October 10, 2023 21:55
@bernardjkim bernardjkim added this pull request to the merge queue Oct 10, 2023
Merged via the queue into master with commit 27fae1a Oct 10, 2023
@bernardjkim bernardjkim deleted the bernard/ecs-auto-update branch October 10, 2023 22:18
@public-teleport-github-review-bot
Copy link
Copy Markdown

@bernardjkim See the table below for backport results.

Branch Result
branch/v13 Failed
branch/v14 Failed

bernardjkim added a commit that referenced this pull request Oct 11, 2023
* Add support for ecs agent auto updates

* fix unit test

* Remove unused var

* Addres feedback

* Use list of available AWS database regions

* Run update task on proxy instances

* Revert GenerateAWSOIDCToken

* Move const to start of file

* Address feedback

* Create separate DeployServiceUpdater struct

* Address feedback

- Perform updates in parallel
- Add additional logging
- Add additional documentation

* debug

* Address feedback

- Check OwnershipTags
- Use semaphore pkg
- Release semaphore lease on success

* Make OwnershipTags explicitly required

* Add cluster alert

* Fix typo and update message

* Revert cluster alert

* Update err messages

* Check minimum compatible server version

* Update log msg
bernardjkim added a commit that referenced this pull request Oct 11, 2023
* Add support for ecs agent auto updates

* fix unit test

* Remove unused var

* Addres feedback

* Use list of available AWS database regions

* Run update task on proxy instances

* Revert GenerateAWSOIDCToken

* Move const to start of file

* Address feedback

* Create separate DeployServiceUpdater struct

* Address feedback

- Perform updates in parallel
- Add additional logging
- Add additional documentation

* debug

* Address feedback

- Check OwnershipTags
- Use semaphore pkg
- Release semaphore lease on success

* Make OwnershipTags explicitly required

* Add cluster alert

* Fix typo and update message

* Revert cluster alert

* Update err messages

* Check minimum compatible server version

* Update log msg
github-merge-queue Bot pushed a commit that referenced this pull request Oct 13, 2023
* Add support for ecs agent auto updates

* fix unit test

* Remove unused var

* Addres feedback

* Use list of available AWS database regions

* Run update task on proxy instances

* Revert GenerateAWSOIDCToken

* Move const to start of file

* Address feedback

* Create separate DeployServiceUpdater struct

* Address feedback

- Perform updates in parallel
- Add additional logging
- Add additional documentation

* debug

* Address feedback

- Check OwnershipTags
- Use semaphore pkg
- Release semaphore lease on success

* Make OwnershipTags explicitly required

* Add cluster alert

* Fix typo and update message

* Revert cluster alert

* Update err messages

* Check minimum compatible server version

* Update log msg
@camscale camscale mentioned this pull request Oct 16, 2023
github-merge-queue Bot pushed a commit that referenced this pull request Oct 17, 2023
)

* Add support for deploy service agent auto updates (#31982)

* Add support for ecs agent auto updates

* fix unit test

* Remove unused var

* Addres feedback

* Use list of available AWS database regions

* Run update task on proxy instances

* Revert GenerateAWSOIDCToken

* Move const to start of file

* Address feedback

* Create separate DeployServiceUpdater struct

* Address feedback

- Perform updates in parallel
- Add additional logging
- Add additional documentation

* debug

* Address feedback

- Check OwnershipTags
- Use semaphore pkg
- Release semaphore lease on success

* Make OwnershipTags explicitly required

* Add cluster alert

* Fix typo and update message

* Revert cluster alert

* Update err messages

* Check minimum compatible server version

* Update log msg

* Check for nil auth client
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants