Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement TextMarshaler for ContainerStatus #4135

Merged
merged 1 commit into from
Apr 9, 2024

Conversation

amogh09
Copy link
Contributor

@amogh09 amogh09 commented Apr 5, 2024

Summary

ContainerStatus currently does not implement encoding.TextMarshaler interface which causes it to be marshaled as integers (such as "1") instead of strings (such as "PULLED") in some cases that depend on TextMarshaler interface. An example of such a case is when ContainerStatus is a key in a map. Due to this issue, TransitionDependenciesMap type, which is a type alias for map[ContainerStatus]TransitionDependencySet, is marshaled with integer keys as shown below.

{
  "1": {
    "ContainerDependencies": [
      {
        "ContainerName": "container",
        "SatisfiedStatus": "RUNNING"
      }
    ]
  }
}

This behavior of json.Marshal function is documented.

Map values encode as JSON objects. The map's key type must either be a string, an integer type, or implement encoding.TextMarshaler. The map keys are sorted and used as JSON object keys by applying the following rules, subject to the UTF-8 coercion described for string values above:

  • keys of any string type are used directly
  • encoding.TextMarshalers are marshaled
  • integer keys are converted to strings

It is problematic to marshal ContainerStatus as integers because when a new container state is introduced data marshaled by an older Agent version will be silently unmarshaled incorrectly by a newer Agent version. For example, an older Agent version would marshal ContainerPulled state as "1" but if a new state is introduced between ContainerStatusNone and ContainerPulled then a newer Agent version will unmarshal "1" as that new state. We will be introducing a new container state in an upcoming PR which is why this PR is needed first.

To solve this issue, this PR implements encoding.TestMarshaler and its counterpart encoding.TestUnmarshaler interfaces for ContainerStatus type so that text marshaling of ContainerStatus uses strings instead of integers.

On the other hand, when unmarshaling JSON object keys, json.Unmarshal function gives preference to json.Unmarshaler interface over encoding.TextUnmarshaler.

To unmarshal a JSON object into a map, Unmarshal first establishes a map to use. If the map is nil, Unmarshal allocates a new map. Otherwise Unmarshal reuses the existing map, keeping existing entries. Unmarshal then stores key-value pairs from the JSON object into the map. The map's key type must either be any string type, an integer, implement json.Unmarshaler, or implement encoding.TextUnmarshaler.

For this reason, we need to make ContainerStatus.UnmarshalJSON method backwards-compatible with integer based marshaling of ContainerStatus. This PR adds code in UnmarshalJSON to handle integer to ContainerStatus mapping for old state files.

Implementation details

  • Implement MarshalText method for ContainerStatus. The method has a value receiver as otherwise the method is not invoked for ContainerStatus values. The method is invoked for pointers to ContainerStatus too so both cases are covered. The method simply calls String() method.
  • Implement UnmarshalText method for ContainerStatus which unmarshals ContainerStatus from its text form.
  • Update MarshalJSON method of ContainerStatus to capture current integer to ContainerStatus mapping and use it unmarshal ContainerStatus from its old integer based marshaling. The mapping from integer to ContainerStatus does not need to be updated as we add new states as it is only for backwards compatibility with old Agent state files that won't ever have new states.
  • Add new tests for TransitionDependenciesMap type to ensure that ContainerStatus keys are marshaled as strings now and unmarshaling from older formats works (for backward-compatibility).

Forward compatibility

This change is not forward compatible (but it is backward compatible) with older Agent versions as Agent's state loading logic does not take into account the version of the persisted state. This is true for all changes made to state format and contents. If Agent is downgraded after running an Agent version with this change, then the downgraded version will not have any state transition dependencies loaded for any containers. Agent will still start-up and run.

The following error will show in the logs.

level=debug time=2024-04-08T14:48:13Z msg="Unmarshal 'TransitionDependencySet': {\"PULLED\":{\"ContainerDependencies\":[{\"ContainerName\":\"~internal~ecs~pause\",\"SatisfiedStatus\":\"RESOURCES_PROVISIONED\"}],\"ResourceDependencies\":[{\"Name\":\"cgroup\",\"RequiredStatus\":1}]}}, not a map: json: cannot unmarshal number PULLED into Go value of type status.ContainerStatus" module=transitiondependency.go
level=debug time=2024-04-08T14:48:13Z msg="Unmarshal 'TransitionDependencySet': {\"STOPPED\":{\"ContainerDependencies\":[{\"ContainerName\":\"sleep\",\"SatisfiedStatus\":\"STOPPED\"}],\"ResourceDependencies\":null}}, not a map: json: cannot unmarshal number STOPPED into Go value of type status.ContainerStatus" module=transitiondependency.go
level=debug time=2024-04-08T14:48:13Z msg="Unmarshal 'TransitionDependencySet': {\"PULLED\":{\"ContainerDependencies\":[{\"ContainerName\":\"~internal~ecs~pause\",\"SatisfiedStatus\":\"RESOURCES_PROVISIONED\"}],\"ResourceDependencies\":[{\"Name\":\"cgroup\",\"RequiredStatus\":1}]}}, not a map: json: cannot unmarshal number PULLED into Go value of type status.ContainerStatus" module=transitiondependency.go
level=debug time=2024-04-08T14:48:13Z msg="Unmarshal 'TransitionDependencySet': {\"STOPPED\":{\"ContainerDependencies\":[{\"ContainerName\":\"sleep\",\"SatisfiedStatus\":\"STOPPED\"}],\"ResourceDependencies\":null}}, not a map: json: cannot unmarshal number STOPPED into Go value of type status.ContainerStatus" module=transitiondependency.go

Testing

  • Thorough unit tests are added.

Performed the following manual test.

  • Ran an Agent without the changes in this PR and launched an awsvpc task. Awsvpc tasks make use of TransitionDependenciesMap to ensure that task containers are started after the pause container reaches RUNNING and the pause container is stopped after task containers reach STOPPED.
  • While the task is running, stopped Agent and started a new Agent with the changes in this PR. Some extra logs were added to observe parsing of containers from the state file. Observed the logs below.
level=debug time=2024-04-05T17:30:00Z msg="Parsing container: {\"DockerId\":\"3ac5b45fca171d7047a7b870c26ce5765c42a9d0d02b9adbf7bafe31c478a435\",\"DockerName\":\"ecs-tss-sleep-awsvpc-2-sleep-f0b4b7c6c0cfe1b23a00\",\"Container\":{\"Name\":\"sleep\",\"RuntimeID\":\"3ac5b45fca171d7047a7b870c26ce5765c42a9d0d02b9adbf7bafe31c478a435\",\"taskARN\":\"arn:aws:ecs:us-west-2:979604884904:task/test/53a8a19f210b43a0aa727f7d3b3d6865\",\"V3EndpointID\":\"fe5d2f12-9ec9-4ac5-81f4-454118021fe2\",\"Image\":\"busybox:latest\",\"ImageID\":\"sha256:ba5dc23f65d4cc4a4535bce55cf9e63b068eb02946e3422d3587e8ce803b6aab\",\"ImageDigest\":\"sha256:650fd573e056b679a5110a70aabeb01e26b76e545ec4b9c70a9523f2dfaf18c6\",\"Command\":[\"sh\",\"-c\",\"sleep 600\"],\"Cpu\":128,\"GPUIDs\":null,\"Memory\":16,\"Links\":null,\"firelensConfiguration\":null,\"volumesFrom\":[],\"mountPoints\":[],\"portMappings\":null,\"secrets\":null,\"Essential\":true,\"EntryPoint\":null,\"environment\":{\"AWS_CONTAINER_CREDENTIALS_RELATIVE_URI\":\"/v2/credentials/1b9f82aa-68d8-48b7-aa93-9e9c50b7d3d8\",\"AWS_EXECUTION_ENV\":\"AWS_ECS_EC2\",\"ECS_AGENT_URI\":\"http://169.254.170.2/api/fe5d2f12-9ec9-4ac5-81f4-454118021fe2\",\"ECS_CONTAINER_METADATA_URI\":\"http://169.254.170.2/v3/fe5d2f12-9ec9-4ac5-81f4-454118021fe2\",\"ECS_CONTAINER_METADATA_URI_V4\":\"http://169.254.170.2/v4/fe5d2f12-9ec9-4ac5-81f4-454118021fe2\"},\"environmentFiles\":null,\"overrides\":{\"command\":null},\"dockerConfig\":{\"config\":\"{}\",\"hostConfig\":\"{\\\"NetworkMode\\\":\\\"awsvpc\\\",\\\"CapAdd\\\":[],\\\"CapDrop\\\":[],\\\"Sysctls\\\":{}}\",\"version\":\"1.18\"},\"registryAuthentication\":null,\"LogsAuthStrategy\":\"\",\"StartTimeout\":0,\"StopTimeout\":0,\"desiredStatus\":\"RUNNING\",\"KnownStatus\":\"RUNNING\",\"TransitionDependencySet\":{\"1\":{\"ContainerDependencies\":[{\"ContainerName\":\"~internal~ecs~pause\",\"SatisfiedStatus\":\"RESOURCES_PROVISIONED\"}],\"ResourceDependencies\":[{\"Name\":\"cgroup\",\"RequiredStatus\":1}]}},\"RunDependencies\":null,\"IsInternal\":\"NORMAL\",\"ApplyingError\":null,\"SentStatus\":\"RUNNING\",\"metadataFileUpdated\":false,\"KnownExitCode\":null,\"KnownPortBindings\":null,\"ContainerArn\":\"arn:aws:ecs:us-west-2:979604884904:container/test/53a8a19f210b43a0aa727f7d3b3d6865/7ffcfd76-0bc8-4cdd-8651-024e383973b2\",\"containerTornDown\":false,\"ContainerHasPortRange\":false,\"ContainerPortSet\":{},\"ContainerPortRangeMap\":{}}}" module=container_client.go
level=debug time=2024-04-05T17:30:00Z msg="Parsed container transition dependencies: map[PULLED:{ContainerDependencies:[{ContainerName:~internal~ecs~pause SatisfiedStatus:RESOURCES_PROVISIONED DependentStatus:NONE}] ResourceDependencies:[{Name:cgroup RequiredStatus:1}]}]" module=container_client.go
level=debug time=2024-04-05T17:30:00Z msg="Parsing container: {\"DockerId\":\"9834bcaed23aafd2c9d19fad81ba4bda145600e3e5fe877046aff28ec598cb29\",\"DockerName\":\"ecs-tss-sleep-awsvpc-2-internalecspause-bca291b5db8f88fd3400\",\"Container\":{\"Name\":\"~internal~ecs~pause\",\"RuntimeID\":\"9834bcaed23aafd2c9d19fad81ba4bda145600e3e5fe877046aff28ec598cb29\",\"taskARN\":\"arn:aws:ecs:us-west-2:979604884904:task/test/53a8a19f210b43a0aa727f7d3b3d6865\",\"V3EndpointID\":\"\",\"Image\":\"amazon/amazon-ecs-pause:0.1.0\",\"ImageID\":\"\",\"ImageDigest\":\"\",\"Command\":null,\"Cpu\":0,\"GPUIDs\":null,\"Memory\":0,\"Links\":null,\"firelensConfiguration\":null,\"volumesFrom\":null,\"mountPoints\":null,\"portMappings\":null,\"secrets\":null,\"Essential\":true,\"EntryPoint\":null,\"environment\":null,\"environmentFiles\":null,\"overrides\":{\"command\":null},\"dockerConfig\":{\"config\":null,\"hostConfig\":null,\"version\":null},\"registryAuthentication\":null,\"LogsAuthStrategy\":\"\",\"StartTimeout\":0,\"StopTimeout\":0,\"desiredStatus\":\"RESOURCES_PROVISIONED\",\"KnownStatus\":\"RESOURCES_PROVISIONED\",\"TransitionDependencySet\":{\"5\":{\"ContainerDependencies\":[{\"ContainerName\":\"sleep\",\"SatisfiedStatus\":\"STOPPED\"}],\"ResourceDependencies\":null}},\"RunDependencies\":null,\"IsInternal\":\"CNI_PAUSE\",\"ApplyingError\":null,\"SentStatus\":\"NONE\",\"metadataFileUpdated\":false,\"KnownExitCode\":null,\"KnownPortBindings\":null,\"SteadyStateStatus\":\"RESOURCES_PROVISIONED\",\"containerTornDown\":false,\"ContainerHasPortRange\":false,\"ContainerPortSet\":{},\"ContainerPortRangeMap\":{}}}" module=container_client.go
level=debug time=2024-04-05T17:30:00Z msg="Parsed container transition dependencies: map[STOPPED:{ContainerDependencies:[{ContainerName:sleep SatisfiedStatus:STOPPED DependentStatus:NONE}] ResourceDependencies:[]}]" module=container_client.go

The logs show that the new Agent parsed

\"TransitionDependencySet\":{\"1\":{\"ContainerDependencies\":[{\"ContainerName\":\"~internal~ecs~pause\",\"SatisfiedStatus\":\"RESOURCES_PROVISIONED\"}],\"ResourceDependencies\":[{\"Name\":\"cgroup\",\"RequiredStatus\":1}]}}

as

map[PULLED:{ContainerDependencies:[{ContainerName:~internal~ecs~pause SatisfiedStatus:RESOURCES_PROVISIONED DependentStatus:NONE}] ResourceDependencies:[{Name:cgroup RequiredStatus:1}]}]

and

\"TransitionDependencySet\":{\"5\":{\"ContainerDependencies\":[{\"ContainerName\":\"sleep\",\"SatisfiedStatus\":\"STOPPED\"}],\"ResourceDependencies\":null}}

as

map[STOPPED:{ContainerDependencies:[{ContainerName:sleep SatisfiedStatus:STOPPED DependentStatus:NONE}] ResourceDependencies:[]}]

which shows that "1" was unmarshaled as ContainerPulled and "5" was unmarshaled as ContainerStopped which is expected.

  • Restarted Agent and observed the logs again.
level=debug time=2024-04-05T17:31:54Z msg="Parsing container: {\"DockerId\":\"3ac5b45fca171d7047a7b870c26ce5765c42a9d0d02b9adbf7bafe31c478a435\",\"DockerName\":\"ecs-tss-sleep-awsvpc-2-sleep-f0b4b7c6c0cfe1b23a00\",\"Container\":{\"Name\":\"sleep\",\"RuntimeID\":\"3ac5b45fca171d7047a7b870c26ce5765c42a9d0d02b9adbf7bafe31c478a435\",\"taskARN\":\"arn:aws:ecs:us-west-2:979604884904:task/test/53a8a19f210b43a0aa727f7d3b3d6865\",\"V3EndpointID\":\"fe5d2f12-9ec9-4ac5-81f4-454118021fe2\",\"Image\":\"busybox:latest\",\"ImageID\":\"sha256:ba5dc23f65d4cc4a4535bce55cf9e63b068eb02946e3422d3587e8ce803b6aab\",\"ImageDigest\":\"sha256:650fd573e056b679a5110a70aabeb01e26b76e545ec4b9c70a9523f2dfaf18c6\",\"Command\":[\"sh\",\"-c\",\"sleep 600\"],\"Cpu\":128,\"GPUIDs\":null,\"Memory\":16,\"Links\":null,\"firelensConfiguration\":null,\"volumesFrom\":[],\"mountPoints\":[],\"portMappings\":null,\"secrets\":null,\"Essential\":true,\"EntryPoint\":null,\"environment\":{\"AWS_CONTAINER_CREDENTIALS_RELATIVE_URI\":\"/v2/credentials/1b9f82aa-68d8-48b7-aa93-9e9c50b7d3d8\",\"AWS_EXECUTION_ENV\":\"AWS_ECS_EC2\",\"ECS_AGENT_URI\":\"http://169.254.170.2/api/fe5d2f12-9ec9-4ac5-81f4-454118021fe2\",\"ECS_CONTAINER_METADATA_URI\":\"http://169.254.170.2/v3/fe5d2f12-9ec9-4ac5-81f4-454118021fe2\",\"ECS_CONTAINER_METADATA_URI_V4\":\"http://169.254.170.2/v4/fe5d2f12-9ec9-4ac5-81f4-454118021fe2\"},\"environmentFiles\":null,\"overrides\":{\"command\":null},\"dockerConfig\":{\"config\":\"{}\",\"hostConfig\":\"{\\\"NetworkMode\\\":\\\"awsvpc\\\",\\\"CapAdd\\\":[],\\\"CapDrop\\\":[],\\\"Sysctls\\\":{}}\",\"version\":\"1.18\"},\"registryAuthentication\":null,\"LogsAuthStrategy\":\"\",\"StartTimeout\":0,\"StopTimeout\":0,\"desiredStatus\":\"RUNNING\",\"KnownStatus\":\"RUNNING\",\"TransitionDependencySet\":{\"PULLED\":{\"ContainerDependencies\":[{\"ContainerName\":\"~internal~ecs~pause\",\"SatisfiedStatus\":\"RESOURCES_PROVISIONED\"}],\"ResourceDependencies\":[{\"Name\":\"cgroup\",\"RequiredStatus\":1}]}},\"RunDependencies\":null,\"IsInternal\":\"NORMAL\",\"ApplyingError\":null,\"SentStatus\":\"RUNNING\",\"metadataFileUpdated\":false,\"KnownExitCode\":null,\"KnownPortBindings\":null,\"ContainerArn\":\"arn:aws:ecs:us-west-2:979604884904:container/test/53a8a19f210b43a0aa727f7d3b3d6865/7ffcfd76-0bc8-4cdd-8651-024e383973b2\",\"containerTornDown\":false,\"ContainerHasPortRange\":false,\"ContainerPortSet\":{},\"ContainerPortRangeMap\":{}}}" module=container_client.go
level=debug time=2024-04-05T17:31:54Z msg="Parsed container transition dependencies: map[PULLED:{ContainerDependencies:[{ContainerName:~internal~ecs~pause SatisfiedStatus:RESOURCES_PROVISIONED DependentStatus:NONE}] ResourceDependencies:[{Name:cgroup RequiredStatus:1}]}]" module=container_client.go
level=debug time=2024-04-05T17:31:54Z msg="Parsing container: {\"DockerId\":\"9834bcaed23aafd2c9d19fad81ba4bda145600e3e5fe877046aff28ec598cb29\",\"DockerName\":\"ecs-tss-sleep-awsvpc-2-internalecspause-bca291b5db8f88fd3400\",\"Container\":{\"Name\":\"~internal~ecs~pause\",\"RuntimeID\":\"9834bcaed23aafd2c9d19fad81ba4bda145600e3e5fe877046aff28ec598cb29\",\"taskARN\":\"arn:aws:ecs:us-west-2:979604884904:task/test/53a8a19f210b43a0aa727f7d3b3d6865\",\"V3EndpointID\":\"\",\"Image\":\"amazon/amazon-ecs-pause:0.1.0\",\"ImageID\":\"sha256:9dd4685d3644733005b3f933b3323c7a3b0e2a1cbcdf9a5264fd0a40861a3379\",\"ImageDigest\":\"\",\"Command\":null,\"Cpu\":0,\"GPUIDs\":null,\"Memory\":0,\"Links\":null,\"firelensConfiguration\":null,\"volumesFrom\":null,\"mountPoints\":null,\"portMappings\":null,\"secrets\":null,\"Essential\":true,\"EntryPoint\":null,\"environment\":null,\"environmentFiles\":null,\"overrides\":{\"command\":null},\"dockerConfig\":{\"config\":null,\"hostConfig\":null,\"version\":null},\"registryAuthentication\":null,\"LogsAuthStrategy\":\"\",\"StartTimeout\":0,\"StopTimeout\":0,\"desiredStatus\":\"RESOURCES_PROVISIONED\",\"KnownStatus\":\"RESOURCES_PROVISIONED\",\"TransitionDependencySet\":{\"STOPPED\":{\"ContainerDependencies\":[{\"ContainerName\":\"sleep\",\"SatisfiedStatus\":\"STOPPED\"}],\"ResourceDependencies\":null}},\"RunDependencies\":null,\"IsInternal\":\"CNI_PAUSE\",\"ApplyingError\":null,\"SentStatus\":\"NONE\",\"metadataFileUpdated\":false,\"KnownExitCode\":null,\"KnownPortBindings\":null,\"SteadyStateStatus\":\"RESOURCES_PROVISIONED\",\"containerTornDown\":false,\"ContainerHasPortRange\":false,\"ContainerPortSet\":{},\"ContainerPortRangeMap\":{}}}" module=container_client.go
level=debug time=2024-04-05T17:31:54Z msg="Parsed container transition dependencies: map[STOPPED:{ContainerDependencies:[{ContainerName:sleep SatisfiedStatus:STOPPED DependentStatus:NONE}] ResourceDependencies:[]}]" module=container_client.go

This time Agent saw string based keys for ContainerStatus when parsing containers from the state file which means that the new Agent correctly marshaled ContainerStatus keys as strings instead of integers.

\"TransitionDependencySet\":{\"PULLED\":{\"ContainerDependencies\":[{\"ContainerName\":\"~internal~ecs~pause\",\"SatisfiedStatus\":\"RESOURCES_PROVISIONED\"}],\"ResourceDependencies\":[{\"Name\":\"cgroup\",\"RequiredStatus\":1}]}}
\"TransitionDependencySet\":{\"STOPPED\":{\"ContainerDependencies\":[{\"ContainerName\":\"sleep\",\"SatisfiedStatus\":\"STOPPED\"}],\"ResourceDependencies\":null}}

After parsing both containers had the same value as before for TransitionDependencySet.

map[PULLED:{ContainerDependencies:[{ContainerName:~internal~ecs~pause SatisfiedStatus:RESOURCES_PROVISIONED DependentStatus:NONE}] ResourceDependencies:[{Name:cgroup RequiredStatus:1}]}]
map[STOPPED:{ContainerDependencies:[{ContainerName:sleep SatisfiedStatus:STOPPED DependentStatus:NONE}] ResourceDependencies:[]}]

New tests cover the changes: yes

Description for the changelog

Enhancement: Marshal ContainerStatus as strings instead of integers for JSON keys by implementing encoding.TextMarshaler.

Does this PR include breaking model changes? If so, Have you added transformation functions?

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@amogh09 amogh09 requested a review from a team as a code owner April 5, 2024 17:10
@prateekchaudhry
Copy link
Contributor

While the task is running, stopped Agent and started a new Agent with the changes in this PR. Some extra logs were added to observe parsing of containers from the state file. Observed the logs below.

Do we need to test reverse, launching task with new agent and moving back to previous agent?

@amogh09
Copy link
Contributor Author

amogh09 commented Apr 8, 2024

Thanks @prateekchaudhry. Yes, I have added a new "Forward compatibility" section to the PR description.

@amogh09 amogh09 force-pushed the container-status-text branch from 9603beb to b2836a5 Compare April 9, 2024 16:36
@amogh09 amogh09 merged commit 22af4f2 into aws:dev Apr 9, 2024
40 checks passed
@harishxr harishxr mentioned this pull request Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants