Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ECS Workload Agent #725

Merged
merged 1 commit into from
Jan 13, 2023

Conversation

stmcginnis
Copy link
Contributor

@stmcginnis stmcginnis commented Dec 21, 2022

Issue number:

Closes: #617

Description of changes:

This adds a new test agent for running workload testing in an ECS cluster.

Testing done:

See details of testing in comment below

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@stmcginnis stmcginnis marked this pull request as draft December 21, 2022 23:51
@stmcginnis
Copy link
Contributor Author

Note to self: always, always, always rebuild after rebasing.

@stmcginnis stmcginnis force-pushed the ecs-workload-agent branch 2 times, most recently from cea6e9a to 43d48ca Compare December 23, 2022 14:51
@stmcginnis
Copy link
Contributor Author

Still testing to be done, but marking this as ready for review since I think the error is something in my local test setup and not necessarily anything with the agent itself.

@stmcginnis stmcginnis marked this pull request as ready for review December 23, 2022 14:54
bottlerocket/types/src/agent_config.rs Outdated Show resolved Hide resolved
bottlerocket/types/src/agent_config.rs Show resolved Hide resolved
bottlerocket/agents/src/bin/ecs-workload-agent/main.rs Outdated Show resolved Hide resolved
bottlerocket/agents/src/bin/ecs-workload-agent/main.rs Outdated Show resolved Hide resolved
bottlerocket/agents/src/bin/k8s-workload-agent/main.rs Outdated Show resolved Hide resolved
@ecpullen ecpullen self-requested a review January 4, 2023 14:26
@stmcginnis stmcginnis force-pushed the ecs-workload-agent branch 3 times, most recently from 0e19fef to 1ca83cb Compare January 5, 2023 08:42
@ecpullen ecpullen self-requested a review January 5, 2023 17:05
This adds a new test agent for running workload testing in an ECS
cluster.

Signed-off-by: Sean McGinnis <[email protected]>
Copy link
Contributor

@ecpullen ecpullen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you verify that the nvidia test works with this agent?

@stmcginnis
Copy link
Contributor Author

LGTM. Can you verify that the nvidia test works with this agent?

Able to deploy an ECS cluster instance running Bottlerocket 1.11.1, aws-ecs-1-nvidia, using the following:

apiVersion: testsys.system/v1
kind: Test
metadata:             
  name: ecs-workload              
  namespace: testsys 
spec:             
  agent:                 
    configuration:    
      region: "us-east-2"
      clusterName: "brtest"                                                    
      tests:                          
      - name: nvidia-workload
        image: public.ecr.aws/p0f9g6e8/gpu-tests:latest
        gpu: true                   
    image: ecs-workload-agent:demo
    name: ecs-workload-test-agent
    keepRunning: true
    timeout: "5000"
    secrets:
      awsCredentials: "aws-creds"
  resources:
  - brtest-instances
  - brtest
  dependsOn: []
---
apiVersion: testsys.system/v1
kind: Resource
metadata:
  name: brtest
  namespace: testsys
spec:
  agent:
    name: ecs-provider
    image: ecs-resource-agent:demo
    keepRunning: true
    configuration:
      clusterName: brtest
      region: "us-east-2"
    secrets:
      awsCredentials: "aws-creds"
  dependsOn: []
  destructionPolicy: never
---
apiVersion: testsys.system/v1
kind: Resource
metadata:
  name: brtest-instances
  namespace: testsys
spec:
  agent:
    name: ec2-provider
    image: ec2-resource-agent:demo
    keepRunning: true
    configuration:
      clusterName: brtest
      clusterType: ecs
      instanceCount: 2
      instanceProfileArn: ${brtest.iamInstanceProfileArn}
      nodeAmi: "ami-0ba78cb33c07e2333"
      region: "us-east-2"
      subnetIds: ${brtest.publicSubnetIds}
      instanceTypes: ["g4dn.xlarge"]
    secrets:
      awsCredentials: "aws-creds"
  dependsOn: [brtest]
  destructionPolicy: never

Able to create cluster, launch and join EC2 instances, and execute NVIDIA workload test. Logs from the workload test:

$ k -n testsys logs pod/ecs-workload-qzxnr
[2023-01-09T16:27:02Z INFO  ecs_workload_agent] Initializing ECS workload test agent...
[2023-01-09T16:27:02Z INFO  ecs_workload_agent] Waiting for registered container instances...
[2023-01-09T16:27:02Z INFO  ecs_workload_agent] Waiting for cluster to have registered instances
[2023-01-09T16:27:04Z INFO  ecs_workload_agent] Waiting for cluster to have registered instances
[2023-01-09T16:27:06Z INFO  ecs_workload_agent] Waiting for cluster to have registered instances
[2023-01-09T16:27:08Z INFO  ecs_workload_agent] Waiting for cluster to have registered instances
[2023-01-09T16:27:09Z INFO  ecs_workload_agent] Running task 'arn:aws:ecs:us-east-2:861807767978:task-definition/testsys-bottlerocket-nvidia-workload:1'
[2023-01-09T16:27:09Z INFO  ecs_workload_agent] Waiting for tasks to complete...
[2023-01-09T16:27:57Z INFO  test_agent::agent] Test execution finished without returning an error.
[2023-01-09T16:27:57Z INFO  test_agent::agent] Test output tarball created.
[2023-01-09T16:27:57Z INFO  test_agent::agent] 'keep_running' is true.

Stopped task in the ECS cluster shows exited with exit code 0.

@ecpullen ecpullen requested a review from etungsten January 11, 2023 15:38
@stmcginnis stmcginnis merged commit 529cc04 into bottlerocket-os:develop Jan 13, 2023
@stmcginnis stmcginnis deleted the ecs-workload-agent branch January 13, 2023 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

workload test agent for ECS
3 participants