Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Container fails to launch on AWS Fargate due to IO error #1033

Open
jakeybrown92 opened this issue Jan 16, 2024 · 4 comments
Open

[Bug] Container fails to launch on AWS Fargate due to IO error #1033

jakeybrown92 opened this issue Jan 16, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@jakeybrown92
Copy link

jakeybrown92 commented Jan 16, 2024

Description

We are using docker buildx to build docker images for amd64 and arm64. We then have a script to create a soci index for both platforms and push to ECR.

export REGISTRY_PASSWORD=$(aws ecr get-login-password --region $AWS_DEFAULT_REGION)
ctr image pull --user AWS:$REGISTRY_PASSWORD $AWS_ECR_ACCOUNT_URL/$IMAGE_TYPE:$IMAGE_TAG-latest --platform linux/amd64 --platform linux/arm64 >/dev/null 2>&1 || true
soci create $AWS_ECR_ACCOUNT_URL/$IMAGE_TYPE:$IMAGE_TAG-latest --platform linux/amd64 --platform linux/arm64
soci push --user AWS:$REGISTRY_PASSWORD $AWS_ECR_ACCOUNT_URL/$IMAGE_TYPE:$IMAGE_TAG-latest --platform linux/amd64 --platform linux/arm64

This works fine for all of our other Docker images except python (same issue for all different python versions) We build on-top of the public aws sam build for python. The docker entrypoint is copied into file then executed(the same as our other working builds)

FROM --platform=$BUILDPLATFORM public.ecr.aws/sam/build-python3.8:latest
COPY docker-entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
ENTRYPOINT ["tini", "--", "/usr/local/bin/docker-entrypoint.sh"]

The ECS Fargate container fails to start. The task definition is running on x86. When checking the ECS logs, I can see the following error:-
exec /usr/local/bin/docker-entrypoint.sh: input/output error
as soon as we delete the soci index and image index from ECR for the particular image the issue is gone and the container gets pulled as expected. So we know there is not an issue with the image, its just when the soci index exists for the image

Steps to reproduce the bug

No response

Describe the results you expected

The container to lazy load the python images with no errors.

Host information

  1. OS:
  2. Snapshotter Version: v0.4.1
  3. Containerd Version: 1.6.17

Any additional context or information about the bug

No response

@jakeybrown92 jakeybrown92 added the bug Something isn't working label Jan 16, 2024
@austinvazquez
Copy link
Contributor

Hi @jakeybrown92 thanks for trying out SOCI snapshotter! For your issue, I have tested and successfully launched a container using public.ecr.aws/sam/build-python3.8:latest as a base on SOCI v0.4.1.

Are you able to reproduce the issue running your container in a non-Fargate environment and provide the SOCI logs to give some more insights to the error which is preventing your container from launching?

Alternatively if you have an AWS Support plan you can file a technical support ticket with AWS Fargate which will enable us to investigate it with the service team. (Not a requirement by any means; just mentioning it because of the AWS Fargate reference)

@austinvazquez austinvazquez changed the title [Bug] [Bug] Container fails to launch on AWS Fargate due to IO error Jan 17, 2024
@jakeybrown92
Copy link
Author

@austinvazquez im running on mac so was unable to get the soci snapshotter binaries working as expected so i have built a soci docker image that contains the binaries and pull this locally. i am then pulling the image from ecr and doing soci index create etc. Seems like there is an issue with the soci-snapshotter-gprc when running it locally on docker though, so unsure where i can look at logs. i am able to run the python images as containers stored in ECR into the soci container locally and also onto my mac locally and not sure the soci indexing would come into play here? I have raised a case with AWS in the mean time
170550340001721

@austinvazquez
Copy link
Contributor

austinvazquez commented Jan 23, 2024

@jakeybrown92 , would like to touch base on this. I have reached out to the AWS Fargate service team and am working to get access to your service ticket so we can begin a root cause analysis.

With respect to running SOCI on a Macbook, I am not familiar with Docker Desktop, but if you are looking for an alternative I have used Finch which allows developers to drop into the Linux VM. [Reference]

Finch has SOCI integration out of the box. See Finch's Lazy Loading documentation.

@jakeybrown92
Copy link
Author

jakeybrown92 commented Jan 24, 2024

Hi @austinvazquez i did not know about finch. That looks really useful. Will use that for other things in the future! Okay so after running the vm and tailing the journalctl and running the below command
finch --snapshotter soci run AWS_ECR_ACCOUNT_URL.dkr.ecr.eu-west-2.amazonaws.com/aws-python3.8-latest --platform linux/arm64

i can see some errors in the logs
Jan 24 11:28:13 lima-finch soci-snapshotter-grpc[1561]: {"key":"finch/235/extract-144466077-9T9h sha256:eb947d9b2e666342317653c3fa40bc74aab421789aee22b8dd32db4d63501df1","level":"info","msg":"preparing filesystem mount at mountpoint=/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/131/fs","parent":"finch/136/sha256:8bbee9c3d40e392d3e3bec61298a9907d48b8367066f6581dbe65797e4224aee","time":"2024-01-24T11:28:13.149631036Z"} Jan 24 11:28:13 lima-finch soci-snapshotter-grpc[1561]: {"error":"unable to fetch SOCI artifacts: cannot fetch list of referrers: unable to fetch referrers: GET \"https://AWS_ECR_ACCOUNT_URL.dkr.ecr.eu-west-2.amazonaws.com/v2/referrers/sha256:378d595be9a5daeead0a87dfb64bd6fe680efb8d4ed46df12e8e298b3985142a?artifactType=application%2Fvnd.amazon.soci.index.v1%2Bjson\": credential required for basic auth","key":"finch/235/extract-144466077-9T9h sha256:eb947d9b2e666342317653c3fa40bc74aab421789aee22b8dd32db4d63501df1","level":"warning","msg":"failed to prepare remote snapshot","parent":"finch/136/sha256:8bbee9c3d40e392d3e3bec61298a9907d48b8367066f6581dbe65797e4224aee","remote-snapshot-prepared":"false","time":"2024-01-24T11:28:13.149963578Z"} Jan 24 11:28:13 lima-finch soci-snapshotter-grpc[1561]: {"layerDigest":"sha256:ed40a07820c97a6060a1faa96dfebab2931c499e316852d835a365ba8b2ded12","level":"info","msg":"preparing snapshot as local snapshot","time":"2024-01-24T11:28:13.150310703Z"} Jan 24 11:28:13 lima-finch soci-snapshotter-grpc[1561]: {"key":"finch/235/extract-144466077-9T9h sha256:eb947d9b2e666342317653c3fa40bc74aab421789aee22b8dd32db4d63501df1","level":"info","msg":"preparing local filesystem at mountpoint=/var/lib/soci-snapshotter-grpc/snapshotter/snapshots/131/fs","parent":"finch/136/sha256:8bbee9c3d40e392d3e3bec61298a9907d48b8367066f6581dbe65797e4224aee","time":"2024-01-24T11:28:13.150353120Z"} Jan 24 11:28:13 lima-finch soci-snapshotter-grpc[1561]: {"digest":"sha256:ed40a07820c97a6060a1faa96dfebab2931c499e316852d835a365ba8b2ded12","key":"finch/235/extract-144466077-9T9h sha256:eb947d9b2e666342317653c3fa40bc74aab421789aee22b8dd32db4d63501df1","level":"info","msg":"fetching artifact from remote","parent":"finch/136/sha256:8bbee9c3d40e392d3e3bec61298a9907d48b8367066f6581dbe65797e4224aee","time":"2024-01-24T11:28:13.150812037Z"} Jan 24 11:28:13 lima-finch soci-snapshotter-grpc[1561]: {"error":"cannot unpack the layer: cannot fetch layer: unable to fetch descriptor (sha256:ed40a07820c97a6060a1faa96dfebab2931c499e316852d835a365ba8b2ded12) from remote store: GET \"https://AWS_ECR_ACCOUNT_URL.dkr.ecr.eu-west-2.amazonaws.com/v2/blobs/sha256:ed40a07820c97a6060a1faa96dfebab2931c499e316852d835a365ba8b2ded12\": credential required for basic auth","key":"finch/235/extract-144466077-9T9h sha256:eb947d9b2e666342317653c3fa40bc74aab421789aee22b8dd32db4d63501df1","level":"warning","msg":"failed to prepare snapshot; deferring to container runtime","parent":"finch/136/sha256:8bbee9c3d40e392d3e3bec61298a9907d48b8367066f6581dbe65797e4224aee","time":"2024-01-24T11:28:13.826788841Z"}

I have also tested this works as expected on finch with another image that successfully works in fargate using soci index. (amazonlinux docker image) I can confirm this works as expected and the image gets lazy loaded from ECR. I also dont see any of the above errors in the logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants