Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move v4 TMDS container and task stats endpoint handlers to ecs-agent module #3791

Merged
merged 7 commits into from
Jul 11, 2023

Conversation

amogh09
Copy link
Contributor

@amogh09 amogh09 commented Jul 7, 2023

Summary

Moving v4 TMDS container and task stats endpoint handlers to ecs-agent module.

The only difference between old handlers and the new handlers is that the latter support publishing metrics. The new handlers are coded to publish an Internal Server Error metric if a 5XX error occurs during the handling of a request. When consuming the new handlers, Agent is set up to inject a no-op metrics publisher, so there is no actual change in functionality introduced in this PR.

Implementation details

ecs-agent module -

  • Add two new methods - GetContainerStats and GetTaskStats - to AgentState interface for getting container and task stats, respectively, from Agent.
  • Add new HTTP handlers - ContainerStatsHandler and TaskStatsHandler - for container stats and task stats endpoints, respectively. The handlers depend on the new methods added to AgentState interface.
  • Both the new handlers follow the same pattern -
    1. read container ID from the request,
    2. get stats by calling the appropriate AgentState method, and
    3. handle any errors and write response.
  • The common workflow of the new handlers is captured in a new generic statsHandler function. Depending on the parameters provided it can work for both endpoints.
  • Add tests for the new handlers.

agent module -

  • Implement new AgentState interface methods for TMDSAgentState type. The implementation is based on the existing implementation for stats endpoint handlers.
  • Update TMDS setup to consume v4 stats handlers from ecs-agent module. A no-op metrics publisher is used so that no metrics are published by the handlers.
  • Delete old stats endpoint handlers.

Testing

Test-driven development was followed. Test coverage for container and task stats endpoints was improved in #3758 and #3761 before these changes to capture the current customer facing behavior of the endpoints. The same tests are passing for this PR, so there is no regression.

Manual regression, stress, and performance tests were also performed. For manual testing, agent was build from source of this PR and deployed to an EC2 instance. Another EC2 instance was provisioned with released Agent version v1.72.0.

For manual regression testing, container and task stats endpoints were called on both the instances and the results were compared using diff. No regression was detected.

For manual stress testing, the endpoints were called at a rate of 3000 rps for 60 seconds. 100% of requests were successful. Note that the default TMDS rate limits are 40 steady and 60 burst, and so 3000 rps is considerably higher than the default.

For manual performance testing, agent was profiled for heap and CPU usage while stress tests were being performed as explained above. Similar heap and CPU usage was seen for both the Agents.

New tests cover the changes: yes

Description for the changelog

Move v4 TMDS container and task stats endpoint handlers to ecs-agent module

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@amogh09 amogh09 marked this pull request as ready for review July 7, 2023 22:19
@amogh09 amogh09 requested a review from a team as a code owner July 7, 2023 22:19
agent/handlers/v4/tmdsstate.go Outdated Show resolved Hide resolved
ecs-agent/tmds/handlers/v4/handlers.go Show resolved Hide resolved
ecs-agent/tmds/handlers/v4/state/state.go Show resolved Hide resolved
ecs-agent/tmds/handlers/v4/state/state.go Show resolved Hide resolved
@amogh09 amogh09 merged commit 1e72259 into aws:dev Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants