Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use dynamodb instead of ssm for JIT config #4446

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

id
Copy link

@id id commented Feb 27, 2025

Add extra capabilities to support ephemeral multiarch runners.
In workflow jobs with large matrix, github can request large number of runners simultaneously. This can cause the ssm to be overwhelmed.
This PR replaces the ssm with dynamodb to store the JIT config.

Also fixed a couple things:

  • runners_ssm_housekeeper schema was not aligned across modules
  • syntax for env in github_agent.ubuntu.pkr.hcl

Added a couple of improvements:

  • updated baseline provisioning script for prebuilt runners to make it closer to what github hosted runners do
  • startup-runner.sh now automatically discovers attached ebs volume, mounts it to /data, and points root directory for docker and containerd daemons to /data/docker and /data/containerd respectively.

@id id requested review from a team as code owners February 27, 2025 14:31
@id id force-pushed the use-dynamodb-ephemeral-multi-arch-prebuilt-example branch 2 times, most recently from f3d5721 to 89cac18 Compare February 27, 2025 17:21
Add extra capabilities to support ephemeral multiarch runners.
In workflow jobs with large matrix, github can request large number of
runners simultaneously. This can cause the ssm to be overwhelmed.
This PR replaces the ssm with dynamodb to store the JIT config.

Also fixed a couple things:
- `runners_ssm_housekeeper` schema was not aligned across modules
- syntax for `env` in `github_agent.ubuntu.pkr.hcl`

Added a couple of improvements:
- updated baseline provisioning script for prebuilt runners to make it
closer to what github hosted runners do
- startup-runner.sh now automatically discovers attached ebs volume
and mounts it to /data
@id id force-pushed the use-dynamodb-ephemeral-multi-arch-prebuilt-example branch from 1711979 to 16a67b8 Compare February 28, 2025 11:10
@npalm
Copy link
Member

npalm commented Mar 1, 2025

@id thanks for creating the PR. It is abit one so need to find some time to review. I see you also added a lambda for cleaning up github left overs. Would it be possible to split this in a sperate PR? Would help the review and testing part.

Are you on our discord server? Maybe a short chat about some question is quicker when I go over the code.

@npalm
Copy link
Member

npalm commented Mar 7, 2025

Not much time this week. Although I had a quick look. Before I dig in the details there are some concerns, I have.

  • The biggest issue is seucurity related. By storing token in this way any runner can access any token in dynamodb. This allows malious users ti steal a and register a mialious runner. See for example this security incident: GHSA-8rp4-w85f-5qh2
  • The main driver is the limitation of SSM. Are you aware that you can avoid the linmitation with the advanced Paramater store. This comes with an extra costs, but extra code, and scenarios to test coming with a costs as well. As you see we are running the project already with a very limited amount of available time. Don't keep me wrong, like the work this is not a real blocker. But someting that needs some more thinking.
  • Would prefer if you can split the PR in multiple. At least split the work you did to clean up GitHub. For this part I need to check for side effects. The module support JIT as well non JIT token. Both working different with registration of a runner if I remember correctly. For JIT the runner should be removed when the task is finished.

@id
Copy link
Author

id commented Mar 8, 2025

@npalm thanks for checking this out. Yes, I'll split the PR into more sensible once. Hopefully will have time next week.

About DynamoDB, I think we could use something like this in combination with using full isntance ARN for the keys instead of just instance id as it is now. I'll test this, and will report back.

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Action": [
				"dynamodb:GetItem",
				"dynamodb:Query"
			],
			"Resource": "arn:aws:dynamodb:us-east-2:***:table/TABLE_NAME",
			"Condition": {
				"ForAllValues:StringEquals": {
					"dynamodb:LeadingKeys": [
						"${ec2:SourceInstanceArn}"
					]
				}
			}
		}
	]
}

@id
Copy link
Author

id commented Mar 8, 2025

If DynamoDB turns out to be working ok, we could as a minimum get rid of ssm housekeeper, since in DynamoDB we have built in TTL for items.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants