Add support for deploy service agent auto updates#31982
Conversation
r0mant
left a comment
There was a problem hiding this comment.
First quick pass, I'll do another more in-depth one later.
| "ecs:DescribeClusters", "ecs:CreateCluster", "ecs:PutClusterCapacityProviders", | ||
| "ecs:DescribeServices", "ecs:CreateService", "ecs:UpdateService", | ||
| "ecs:RegisterTaskDefinition", | ||
| "ecs:RegisterTaskDefinition", "ecs:DescribeTaskDefinition", "ecs:DeregisterTaskDefinition", |
There was a problem hiding this comment.
We have to think of some way to let the users know that their current set of permissions might not be enough, and that they must re-run the script.
Maybe a warning in the logs?
Cluster Alert might also work
There was a problem hiding this comment.
Are we doing anything here?
For new deployments, it should be good (we add all the permissions).
For current deployments, it will fail.
Do we have any kind of alerting?
There was a problem hiding this comment.
We will log the following message if permissions are lacking, but I don't know how effective that will be. https://github.com/gravitational/teleport/pull/31982/files#diff-b2b691cbb3f300d6396f5b1ff26ac2569a944859ac8041067d8c97b23037f9c3R321-R325
I'm not familiar with Cluster Alerts, do you think this would be a better approach?
There was a problem hiding this comment.
The log messages are stored internally (as in Teleport Cloud infra).
They will not be visible to the customer.
Maybe we can get away with looking for those logs ourselves and letting the affected customers know.
But honestly, that feels hacky, error-prone and not scalable.
With a ClusterAlert the customer would know.
I'm not very familiar with those as well, but I believe it would work better (even if we do this, let's keep the warning as well).
@r0mant What do you think?
As for method to fix:
Re-run deploy service configuration script to update permissions.
We don't have an easy way for the user to re-run the script.
It is only generated at that step when enrolling an RDS database.
Can we generate the script URL?
It should look something like this:
curl 'https://<tenant>.teleport.sh/webapi/scripts/integrations/configure/deployservice-iam.sh?integrationName=<integration-name>&awsRegion=<aws-region>&taskRole=<taskRole>&role=<integrationRole>'
I think all the variables are present in the current context.
There was a problem hiding this comment.
I've added a cluster alert here fec081d
Do you know of a way to get the task role in the current context? The only way I'm aware of is extracting the task role from the task definition. But that would require that the instance already has permissions to describe the task definition.
There was a problem hiding this comment.
Yeah, I don't think we can.
I thought it was under Service, but it is actually under Task Definition 😭
In that case, we can only ask them to fill it with the same role that was used previously.
There was a problem hiding this comment.
Okay the alert might look something like the following:
Open Amazon CloudShell and copy/paste the following command to reconfigure integration. Replace TASK_ROLE with deploy service task role name. bash -c "$(curl 'https://bernard-dev.cloud.gravitational.io/webapi/scripts/integrations/configure/deployservice-iam.sh?awsRegion=us-west-2&integrationName=database-access&role=bernard-dev-database-access&taskRole=TASK_ROLE')"
It is pretty verbose tho. We might want to consider adding a docs page later and linking that instead.
There was a problem hiding this comment.
Let's leave cluster alerts out of scope for now and revisit at a later time.
r0mant
left a comment
There was a problem hiding this comment.
I think the overall approach makes sense but I left a few suggestions about code structure/organization.
- Perform updates in parallel - Add additional logging - Add additional documentation
|
@bernardjkim See the table below for backport results.
|
* Add support for ecs agent auto updates * fix unit test * Remove unused var * Addres feedback * Use list of available AWS database regions * Run update task on proxy instances * Revert GenerateAWSOIDCToken * Move const to start of file * Address feedback * Create separate DeployServiceUpdater struct * Address feedback - Perform updates in parallel - Add additional logging - Add additional documentation * debug * Address feedback - Check OwnershipTags - Use semaphore pkg - Release semaphore lease on success * Make OwnershipTags explicitly required * Add cluster alert * Fix typo and update message * Revert cluster alert * Update err messages * Check minimum compatible server version * Update log msg
* Add support for ecs agent auto updates * fix unit test * Remove unused var * Addres feedback * Use list of available AWS database regions * Run update task on proxy instances * Revert GenerateAWSOIDCToken * Move const to start of file * Address feedback * Create separate DeployServiceUpdater struct * Address feedback - Perform updates in parallel - Add additional logging - Add additional documentation * debug * Address feedback - Check OwnershipTags - Use semaphore pkg - Release semaphore lease on success * Make OwnershipTags explicitly required * Add cluster alert * Fix typo and update message * Revert cluster alert * Update err messages * Check minimum compatible server version * Update log msg
* Add support for ecs agent auto updates * fix unit test * Remove unused var * Addres feedback * Use list of available AWS database regions * Run update task on proxy instances * Revert GenerateAWSOIDCToken * Move const to start of file * Address feedback * Create separate DeployServiceUpdater struct * Address feedback - Perform updates in parallel - Add additional logging - Add additional documentation * debug * Address feedback - Check OwnershipTags - Use semaphore pkg - Release semaphore lease on success * Make OwnershipTags explicitly required * Add cluster alert * Fix typo and update message * Revert cluster alert * Update err messages * Check minimum compatible server version * Update log msg
) * Add support for deploy service agent auto updates (#31982) * Add support for ecs agent auto updates * fix unit test * Remove unused var * Addres feedback * Use list of available AWS database regions * Run update task on proxy instances * Revert GenerateAWSOIDCToken * Move const to start of file * Address feedback * Create separate DeployServiceUpdater struct * Address feedback - Perform updates in parallel - Add additional logging - Add additional documentation * debug * Address feedback - Check OwnershipTags - Use semaphore pkg - Release semaphore lease on success * Make OwnershipTags explicitly required * Add cluster alert * Fix typo and update message * Revert cluster alert * Update err messages * Check minimum compatible server version * Update log msg * Check for nil auth client

Contributes to #28780, https://github.com/gravitational/cloud/issues/4880
This PR adds support for auto updates of Deploy Service Agents. More info in RFD.
Some changes from the RFD:
ecs:CreateServiceand thenecs:DeleteService, to update a service, but the implementation just usesecs:UpdateServiceto update the service.