Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ecs_taskdefinition throws a ThrottlingException when the task definition has a large number of revisions #2123

Closed
1 task done
eacherkan-aternity opened this issue Jul 16, 2024 · 0 comments · Fixed by #2124

Comments

@eacherkan-aternity
Copy link
Contributor

Summary

Running ecs_taskdefinition on a task definition with a large number of revisions (>1000) results in a ThrottlingException. This does not happen every time, but once every few runs.

Issue Type

Bug Report

Component Name

ecs_taskdefinition

Ansible Version

$ ansible --version
ansible [core 2.15.12]
  config file = /runner/ansible/ansible.cfg
  configured module search path = ['/runner/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
  ansible collection location = /runner/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.9.19 (main, Jun 11 2024, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] (/usr/bin/python3)
  jinja version = 3.1.4
  libyaml = True

Collection Versions

$ ansible-galaxy collection list
# /runner/.ansible/collections/ansible_collections
Collection              Version
----------------------- -------
amazon.aws              7.2.0  
community.aws           7.2.0  
community.crypto        2.21.0 
community.general       9.2.0  
community.postgresql    3.4.1  
community.windows       2.2.0  

# /usr/share/ansible/collections/ansible_collections
Collection              Version
----------------------- -------
amazon.aws              8.1.0  
ansible.posix           1.5.4  
ansible.windows         2.4.0  
awx.awx                 24.6.1 
azure.azcollection      2.6.0  
community.vmware        4.4.0  
google.cloud            1.3.0  
kubernetes.core         4.0.0  
kubevirt.core           1.4.0  
openstack.cloud         2.2.0  
ovirt.ovirt             3.2.0  
redhatinsights.insights 1.2.2  
theforeman.foreman      4.0.0  

AWS SDK versions

$ pip show boto boto3 botocore
Name: boto
Version: 2.49.0
Summary: Amazon Web Services Library
Home-page: https://github.com/boto/boto/
Author: Mitch Garnaat
Author-email: [email protected]
License: MIT
Location: /runner/.local/lib/python3.9/site-packages
Requires: 
Required-by: 
---
Name: boto3
Version: 1.34.138
Summary: The AWS SDK for Python
Home-page: https://github.com/boto/boto3
Author: Amazon Web Services
Author-email: 
License: Apache License 2.0
Location: /usr/local/lib/python3.9/site-packages
Requires: botocore, jmespath, s3transfer
Required-by: 
---
Name: botocore
Version: 1.34.138
Summary: Low-level, data-driven core of boto 3.
Home-page: https://github.com/boto/botocore
Author: Amazon Web Services
Author-email: 
License: Apache License 2.0
Location: /usr/local/lib/python3.9/site-packages
Requires: jmespath, python-dateutil, urllib3
Required-by: boto3, s3transfer

Configuration

$ ansible-config dump --only-changed
ANSIBLE_NOCOWS(/runner/ansible/ansible.cfg) = True
CONFIG_FILE() = /runner/ansible/ansible.cfg
DEFAULT_LOAD_CALLBACK_PLUGINS(/runner/ansible/ansible.cfg) = True
DEFAULT_PRIVATE_KEY_FILE(/runner/ansible/ansible.cfg) = /runner/ansible/playbooks/ssh/id_rsa_ansible
DEFAULT_REMOTE_USER(/runner/ansible/ansible.cfg) = ec2-user
DEFAULT_ROLES_PATH(/runner/ansible/ansible.cfg) = ['/runner/ansible/roles']
DEFAULT_STDOUT_CALLBACK(/runner/ansible/ansible.cfg) = yaml
DEFAULT_TIMEOUT(/runner/ansible/ansible.cfg) = 30
DEPRECATION_WARNINGS(/runner/ansible/ansible.cfg) = True
HOST_KEY_CHECKING(/runner/ansible/ansible.cfg) = False

OS / Environment

Docker (quay.io/ansible/awx-ee:latest image) on Amazon Linux 2

Steps to Reproduce

- name: new task definition
  ecs_taskdefinition:
    launch_type: "EC2" 
    region: "{{ env_region }}"
    containers:
    - name: "{{ containerName }}"
      cpu: "{{ service_vars.cpu }}"
      memory: "{{ service_vars.memory | default(omit) }}"
      linuxParameters: "{{ service_vars.linuxParameters }}"
      essential: "{{ service_vars.essential }}"
      image: "{{ service_vars.image }}"
      memoryReservation: "{{ service_vars.memoryReservation }}"
      links: "{{ service_vars.links }}"
      mountPoints: "{{ mountPoints_vars }}"
      portMappings: "{{ service_vars.portMappings }}"
      logConfiguration: "{{ service_vars.logConfiguration }}"
      environment: "{{ environment_vars }}"
      dnsSearchDomains: "{{ service_vars.dnsSearchDomains }}"
      entryPoint: "{{ service_vars.entryPoint }}"
      command: "{{ service_vars.command }}"
      ulimits: "{{ service_vars.ulimits }}"
      dnsServers: "{{ service_vars.dnsServers }}"
      disableNetworking: "{{ service_vars.disableNetworking }}"
      privileged: "{{ service_vars.privileged }}"
      readonlyRootFilesystem: "{{ service_vars.readonlyRootFilesystem }}"
      extraHosts: "{{ service_vars.extraHosts }}"
      dockerSecurityOptions: "{{ service_vars.dockerSecurityOptions }}"
      dockerLabels: "{{ service_vars.dockerLabels }}"
      systemControls: "{% if service_vars.systemControls is defined %}{{ service_vars.systemControls }}{% else %}{{ [] }}{% endif %}"
      healthCheck: "{{ ecs_health_check_definition }}"
    runtime_platform:
      cpuArchitecture: "{{ service_vars.container_architecture | default('X86_64') }}"
      operatingSystemFamily: "LINUX"
    network_mode: "{{ service_vars.network_mode }}"
    volumes: "{{ volumes_vars }}"
    family: "{{ service_vars.td_family }}"
    state: present
    revision: "{{ td_revision }}"
    task_role_arn: "{% if service_vars.task_role_arn is defined %}{{ service_vars.task_role_arn }}{% else %}{% endif %}"
  register: current_taskdefinition
  when: service_vars.enabled

Expected Results

Task definition updated successfully (new revision created).

Actual Results

TASK [ecs-create-taskdefinition : new task definition] ***
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the DescribeTaskDefinition operation (reached max retries: 4): Rate exceeded
fatal: [localhost]: FAILED! => changed=false 
  module_stderr: |-
    Traceback (most recent call last):
      File "/runner/.ansible/tmp/ansible-tmp-1721118381.0912366-1713-84717120046822/AnsiballZ_ecs_taskdefinition.py", line 107, in <module>
        _ansiballz_main()
      File "/runner/.ansible/tmp/ansible-tmp-1721118381.0912366-1713-84717120046822/AnsiballZ_ecs_taskdefinition.py", line 99, in _ansiballz_main
        invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
      File "/runner/.ansible/tmp/ansible-tmp-1721118381.0912366-1713-84717120046822/AnsiballZ_ecs_taskdefinition.py", line 47, in invoke_module
        runpy.run_module(mod_name='ansible_collections.community.aws.plugins.modules.ecs_taskdefinition', init_globals=dict(_module_fqn='ansible_collections.community.aws.plugins.modules.ecs_taskdefinition', _modlib_path=modlib_path),
      File "/usr/lib64/python3.9/runpy.py", line 225, in run_module
        return _run_module_code(code, init_globals, run_name, mod_spec)
      File "/usr/lib64/python3.9/runpy.py", line 97, in _run_module_code
        _run_code(code, mod_globals, init_globals,
      File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/tmp/ansible_ecs_taskdefinition_payload_2avgar8z/ansible_ecs_taskdefinition_payload.zip/ansible_collections/community/aws/plugins/modules/ecs_taskdefinition.py", line 1240, in <module>
      File "/tmp/ansible_ecs_taskdefinition_payload_2avgar8z/ansible_ecs_taskdefinition_payload.zip/ansible_collections/community/aws/plugins/modules/ecs_taskdefinition.py", line 1052, in main
      File "/tmp/ansible_ecs_taskdefinition_payload_2avgar8z/ansible_ecs_taskdefinition_payload.zip/ansible_collections/community/aws/plugins/modules/ecs_taskdefinition.py", line 931, in describe_task_definitions
      File "/tmp/ansible_ecs_taskdefinition_payload_2avgar8z/ansible_ecs_taskdefinition_payload.zip/ansible_collections/community/aws/plugins/modules/ecs_taskdefinition.py", line 932, in <listcomp>
      File "/tmp/ansible_ecs_taskdefinition_payload_2avgar8z/ansible_ecs_taskdefinition_payload.zip/ansible_collections/amazon/aws/plugins/module_utils/retries.py", line 107, in deciding_wrapper
      File "/usr/local/lib/python3.9/site-packages/botocore/client.py", line 535, in _api_call
        return self._make_api_call(operation_name, kwargs)
      File "/usr/local/lib/python3.9/site-packages/botocore/client.py", line 983, in _make_api_call
        raise error_class(parsed_response, operation_name)
    botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the DescribeTaskDefinition operation (reached max retries: 4): Rate exceeded
  module_stdout: ''
  msg: |-
    MODULE FAILURE
    See stdout/stderr for the exact error
  rc: 1

Code of Conduct

  • I agree to follow the Ansible Code of Conduct
patchback bot pushed a commit that referenced this issue Jul 23, 2024
)

SUMMARY

Fixes #2123 by adding aws_retry=True to the API calls.

ISSUE TYPE

Bugfix Pull Request

COMPONENT NAME

ecs_taskdefinition
ADDITIONAL INFORMATION

We observed that ecs_taskdefinition intermittently causes a ThrottlingException when running on a task definition with a large number of revisions. Looking at the code, it appears that describe_task_definitions loops over the revisions without using the retry mechanism. This PR attempts to solve the problem by adding aws_retry=True to the API calls.
Due to the nature of the problem (intermittent throttling by AWS), I couldn't devise automated tests that validate the fix.

Reviewed-by: Alina Buzachis
Reviewed-by: Mark Chappell
Reviewed-by: Eli Acherkan
(cherry picked from commit 97131ec)
patchback bot pushed a commit that referenced this issue Jul 23, 2024
)

SUMMARY

Fixes #2123 by adding aws_retry=True to the API calls.

ISSUE TYPE

Bugfix Pull Request

COMPONENT NAME

ecs_taskdefinition
ADDITIONAL INFORMATION

We observed that ecs_taskdefinition intermittently causes a ThrottlingException when running on a task definition with a large number of revisions. Looking at the code, it appears that describe_task_definitions loops over the revisions without using the retry mechanism. This PR attempts to solve the problem by adding aws_retry=True to the API calls.
Due to the nature of the problem (intermittent throttling by AWS), I couldn't devise automated tests that validate the fix.

Reviewed-by: Alina Buzachis
Reviewed-by: Mark Chappell
Reviewed-by: Eli Acherkan
(cherry picked from commit 97131ec)
softwarefactory-project-zuul bot pushed a commit that referenced this issue Aug 1, 2024
) (#2129)

This is a backport of PR #2124 as merged into main (97131ec).
SUMMARY

Fixes #2123 by adding aws_retry=True to the API calls.

ISSUE TYPE


Bugfix Pull Request

COMPONENT NAME

ecs_taskdefinition
ADDITIONAL INFORMATION

We observed that ecs_taskdefinition intermittently causes a ThrottlingException when running on a task definition with a large number of revisions. Looking at the code, it appears that describe_task_definitions loops over the revisions without using the retry mechanism. This PR attempts to solve the problem by adding aws_retry=True to the API calls.
Due to the nature of the problem (intermittent throttling by AWS), I couldn't devise automated tests that validate the fix.

Reviewed-by: Markus Bergholz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant