Skip to content
This repository has been archived by the owner on Nov 7, 2024. It is now read-only.

Parsing layer blob: Broken pipe #657

Closed
cgwalters opened this issue Aug 30, 2024 · 45 comments · Fixed by #677
Closed

Parsing layer blob: Broken pipe #657

cgwalters opened this issue Aug 30, 2024 · 45 comments · Fixed by #677

Comments

@cgwalters
Copy link
Member

stderr: ERROR: Switching: Pulling: Importing: Parsing layer blob sha256:4367367aae6325ce7351edb720e7e6929a7f369205b38fa88d140b7e3d0a274f: Broken pipe (os error 32)"

This one is like my enemy! I have a tracker over here for it coreos/rpm-ostree#4567 too


Discoveries so far:

More generally it's definitely a race condition; I can sometimes reproduce this by doing
ostree refs --delete ostree/container and then re-running the rebase.

Also of note: kola defaults to a uniprocessor VM, which I think is more likely to expose this race.

I'm quite certain it has something to do with the scheduling of us closing the pipe vs calling FinishPipe.

@ggiguash
Copy link

ggiguash commented Sep 4, 2024

@cgwalters , the problem came up again in MicroShift CI (see this log as an example).
Is there anything we can add to our CI on-error handler to produce debugging information to help troubleshoot this problem?

@eslutsky
Copy link

@cgwalters , our CI is in great pain because of this issue, is there a known workaround ?

@cgwalters
Copy link
Member Author

@cgwalters , our CI is in great pain because of this issue, is there a known workaround ?

Not currently other than retrying, which we should probably add some defaults to do that...

@ggiguash
Copy link

ggiguash commented Sep 20, 2024

@cgwalters , this problem is causing a lot of grief in MicroShift CI. It's all over either booting VMs, or even running tests that pull a new layer.

Is there a chance we add the retry w/a sooner than later in CentOS 9 and also backport it to RHEL 9.4? Once implementer, we should immediatelly provide feedback whether it made a difference.

@ggiguash
Copy link

I can "easily" reproduce this problem on a local VM by running the following command to alternate between the images (new -> original -> new -> etc.) without rebooting.

Is there any additional debug information I can provide to give a clue on why this is happening?

# while time bootc switch 192.168.100.1:5000/cos9-bootc-source-optionals --quiet ; do time bootc switch 192.168.100.1:5000/cos9-bootc-source --quiet; done
layers already present: 74; layers needed: 1 (45.6 MB)
Queued for next boot: 192.168.100.1:5000/cos9-bootc-source-optionals
  Version: stream9.20240913.0
  Digest: sha256:1c1de27d0e3d16513687381b702f168726352ae623c04e3272659d1adfddc301

real    0m18.082s
user    0m10.986s
sys     0m3.943s
No changes in 192.168.100.1:5000/cos9-bootc-source => sha256:f2e7c8c61b002dff80e33eb44320d5c9c1d272957227008715e51310be94cb2f
Pruned images: 1 (layers: 7, objsize: 330.1 MB)
Queued for next boot: 192.168.100.1:5000/cos9-bootc-source
  Version: stream9.20240913.0
  Digest: sha256:f2e7c8c61b002dff80e33eb44320d5c9c1d272957227008715e51310be94cb2f

real    0m8.375s
user    0m3.089s
sys     0m3.610s

...
...

layers already present: 68; layers needed: 7 (807.4 MB)
ERROR Switching: Pulling: Importing: Parsing layer blob sha256:b867f3d4a54bac425f7782d4953923c99729b03b19668cf3cbe6176ba4b557e3: Failed to invoke skopeo proxy method FinishPipe: remote error: write |1: broken pipe

real    0m20.450s
user    0m5.339s
sys     0m3.524s

On the registry server side, I'm seeing the following messages in the log when the problem occurs. Looks like the issue is on the client.

time="2024-09-25T05:21:48.560496809Z" level=info msg="response completed" go.version=go1.20.8 http.request.host="192.168.100.1:5000" http.request.id=ccfd5189-e2f8-41da-ad28-e99c8046c909 http.request.method=GET http.request.remoteaddr="192.168.100.151:50422" http.request.uri="/v2/cos9-bootc-source-optionals/blobs/sha256:b867f3d4a54bac425f7782d4953923c99729b03b19668cf3cbe6176ba4b557e3" http.request.useragent="skopeo/1.16.1" http.response.contenttype="application/octet-stream" http.response.duration=945.016873ms http.response.status=200 http.response.written=45551104 
192.168.100.151 - - [25/Sep/2024:05:21:47 +0000] "GET /v2/cos9-bootc-source-optionals/blobs/sha256:b867f3d4a54bac425f7782d4953923c99729b03b19668cf3cbe6176ba4b557e3 HTTP/1.1" 200 45551104 "" "skopeo/1.16.1"

@jeckersb
Copy link
Contributor

I can "easily" reproduce this problem on a local VM by running the following command to alternate between the images (new -> original -> new -> etc.) without rebooting.

I've been trying to reproduce this today without much success. I managed to do it one time by switching back and forth, and then I started poking at things with perf, but haven't been able to reproduce it anymore. I think I must have just gotten really lucky with the first success. I wouldn't expect perf to alter the timings enough to push this into Heisenbug territory since it's mostly "out-of-band", but who knows. I'll let it continue to run in a loop for a while and see if I can't get it to trigger again.

@cgwalters
Copy link
Member Author

As far as tweaking timing one thing to try is using e.g. strace fault injection to add random delays (especially to read+write), something like:

strace -f -e inject=read:when=1+3:delay_enter=100ms bootc ...

(Fine tune the injection step and delay as desired)

@jeckersb
Copy link
Contributor

I let it run for a bit more Wednesday, no luck.

Yesterday I took the opportunity to rebuild my dev setup with a fresh podman-machine, and latest podman-bootc from git. Then I let it run all day trying to reproduce. More precisely, what I did was:

  • Built latest bootc image from main following the instructions
  • Used podman-bootc to run that
  • I mirrored fedora-bootc:40 and fedora-bootc:41 into my local registry (I went directly at quay.io for while but started getting random timeouts)
  • Switched back and forth between 40 and 41 using: while perf record -e raw_syscalls:sys_exit --filter 'ret == -32' -a -g bootc switch --quiet 192.168.122.1:5000/fedora-bootc:40 && perf record -e raw_syscalls:sys_exit --filter 'ret == -32' -a -g bootc switch --quiet 192.168.122.1:5000/fedora-bootc:41; do :; done

Couldn't get it to reproduce, ran it for about 6 hours like that. Then I ran it again, except removing all of the perf bits just to see if that made a difference, but it did not. Still unable to reproduce it.

At this point it's driving me absolutely mad that I was able to reproduce it the one time within like 5 minutes of trying, and then never again 😠

@ggiguash
Copy link

ggiguash commented Sep 29, 2024

@jeckersb, I used the following commands to reproduce the error and I'm attaching the perf.data file.
What else can I do to help?

bootc usr-overlay
dnf install -y perf

SERV='192.168.100.1:5000'
PERF='perf record -e raw_syscalls:sys_exit --filter "ret==-32" -a -g'
CMD1='bootc switch --quiet $SERV/cos9-bootc-source-optionals'
CMD2='bootc switch --quiet $SERV/cos9-bootc-source'

printf "[[registry]]\nlocation = \"${SERV}\"\ninsecure = true\n" > /etc/containers/registries.conf.d/999-insecure-registry.conf

while true; do
    eval "$PERF $CMD1" || break
    eval "$PERF $CMD2" || break
done

The error was the "usual" one.

ERROR Switching: Pulling: Importing: Parsing layer blob sha256:d2dbfce5c205c1e1408e7efc2588dc4f9e06cbb71df6203cbbbf4149d0de80cb: Failed to invoke skopeo proxy method FinishPipe: remote error: write |1: broken pipe

perf.data.zip

@jeckersb
Copy link
Contributor

@ggiguash thanks, definitely a step in the right direction!

Do you have the kernel version and the version of skopeo? Or alternatively if you still have the system just run perf script and paste the output in here. I don't have the debuginfo for whatever version you're running so right now I get this mostly unhelpful stacktrace:

skopeo   17353 [000]  2450.980647: raw_syscalls:sys_exit: NR 1 = -32
        ffffffff939c8990 [unknown] ([kernel.kallsyms])
        ffffffff939c8990 [unknown] ([kernel.kallsyms])
        ffffffff944d4239 [unknown] ([kernel.kallsyms])
        ffffffff944cf46b [unknown] ([kernel.kallsyms])
        ffffffff94600130 [unknown] ([kernel.kallsyms])
            561f0b87a4ce [unknown] (/usr/bin/skopeo)
            561f0b8ffb06 [unknown] (/usr/bin/skopeo)
            561f0b8fdcbb [unknown] (/usr/bin/skopeo)
            561f0b95f2a8 [unknown] (/usr/bin/skopeo)
            561f0b968fb1 [unknown] (/usr/bin/skopeo)
            561f0b9711a5 [unknown] (/usr/bin/skopeo)
            561f0b95505e [unknown] (/usr/bin/skopeo)
            561f0b968f18 [unknown] (/usr/bin/skopeo)
            561f0b968e3c [unknown] (/usr/bin/skopeo)
            561f0b954fd1 [unknown] (/usr/bin/skopeo)
            561f0c3ffa69 [unknown] (/usr/bin/skopeo)
            561f0b8e74c1 [unknown] (/usr/bin/skopeo)

But at the very least from that we can see that it's skopeo getting the EPIPE.

@jeckersb
Copy link
Contributor

Another possibly related CI failure:

Installing to filesystem: Creating ostree deployment: Pulling: Importing: Unencapsulating base: failed to invoke method FinishPipe: failed to invoke method FinishPipe: expected 45 bytes in blob, got 139264

@ggiguash
Copy link

ggiguash commented Sep 30, 2024

@jeckersb , here's the stack after installing the debug symbols for skopeo. I'm also uploading perf archive which you can perf archive --unpack and view on your machine with symbols (at least according to the manual). Please, rename the file to have bz2 extension before unpacking.
perf.data.tar.zip

# rpm -q skopeo skopeo-debuginfo 
skopeo-1.16.1-1.el9.x86_64
skopeo-debuginfo-1.16.1-1.el9.x86_64
# uname -a
Linux localhost.localdomain 5.14.0-511.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Sep 19 06:52:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
# perf script
skopeo   17353 [000]  2450.980647: raw_syscalls:sys_exit: NR 1 = -32
        ffffffff939c8990 syscall_exit_work+0xb0 ([kernel.kallsyms])
        ffffffff939c8990 syscall_exit_work+0xb0 ([kernel.kallsyms])
        ffffffff944d4239 syscall_exit_to_user_mode+0x9 ([kernel.kallsyms])
        ffffffff944cf46b do_syscall_64+0x6b ([kernel.kallsyms])
        ffffffff94600130 entry_SYSCALL_64+0xb0 ([kernel.kallsyms])
            561f0b87a4ce runtime/internal/syscall.Syscall6+0xe (/usr/bin/skopeo)
            561f0b8ffb06 syscall.Syscall+0x46 (/usr/bin/skopeo)
            561f0b8fdcbb syscall.write+0x3b (/usr/bin/skopeo)
            561f0b95f2a8 internal/poll.(*FD).Write+0x368 (/usr/bin/skopeo)
            561f0b968fb1 os.(*File).Write+0x51 (/usr/bin/skopeo)
            561f0b9711a5 os.(*fileWithoutReadFrom).Write+0x25 (/usr/bin/skopeo)
            561f0b95505e io.copyBuffer+0x1de (/usr/bin/skopeo)
            561f0b968f18 os.genericReadFrom+0x58 (/usr/bin/skopeo)
            561f0b968e3c os.(*File).ReadFrom+0x9c (/usr/bin/skopeo)
            561f0b954fd1 io.copyBuffer+0x151 (/usr/bin/skopeo)
            561f0c3ffa69 main.(*proxyHandler).GetBlob.func1+0x1c9 (/usr/bin/skopeo)
            561f0b8e74c1 runtime.goexit.abi0+0x1 (/usr/bin/skopeo)

@cgwalters
Copy link
Member Author

Installing to filesystem: Creating ostree deployment: Pulling: Importing: Unencapsulating base: failed to invoke method FinishPipe: failed to invoke method FinishPipe: expected 45 bytes in blob, got 139264

That that's a much more interesting error...it's almost like we somehow swapped data streams? I think the one we're generally seeing ends like:

remote error: write |1: broken pipe

which my suspicion is that there's some other underlying error that's being masked in this case.

@jeckersb
Copy link
Contributor

jeckersb commented Oct 1, 2024

The one time I got it to reproduce, I got:

ERROR Switching: Pulling: Importing: Parsing layer blob sha256:eea8727e609713e7cc52a52b0835a6e8dbc35a2f9bbf7d8e1c3ad6a095f4d022: Broken pipe (os error 32)

Which I guess is only notable because it's not a remote error so I guess it's failing on the "client" side.

My plan of attack for tomorrow is to figure out the right perf incantation for us to get a stacktrace when we drop a pipe in rust and then see if that sheds any light on things.

cgwalters added a commit to cgwalters/skopeo that referenced this issue Oct 11, 2024
cgwalters added a commit to cgwalters/skopeo that referenced this issue Oct 11, 2024
cgwalters added a commit to cgwalters/skopeo that referenced this issue Oct 11, 2024
cgwalters added a commit to containers/skopeo that referenced this issue Oct 12, 2024
@jeckersb
Copy link
Contributor

jeckersb commented Oct 16, 2024

I've been poking at this for a while now, overdue for at least some sort of update on where this stands.

I realized this is reproducible via just doing a bootc install to-disk so it makes it a bit simpler to parallelize the reproducer. I've been unable to reproduce the issue at all on my f40 laptop, so I was suspicious if maybe there was some difference between stream9 and f40 as the host system. So here's the test setup I've been using so far:

  • I built a fedora-bootc-derived container from git main (quay.io/jeckersb/bootc)
  • I provisioned two VMs on my laptop, one running stream9 and one running f40. Each VM has four VCPUs and four additional 20G qcow disks attached.
  • On each VM I'm running install to-disk parallelized 4x:
    • i=0; while echo -e 'b\nc\nd\ne' | parallel 'podman run --rm --privileged --pid=host --security-opt label=type:unconfined_t -v /dev:/dev -v /var/lib/containers:/var/lib/containers -v .:/output quay.io/jeckersb/bootc bootc install to-disk --filesystem xfs --generic-image --wipe /dev/vd{}'; do echo completed iteration $i; i=$(( $i + 1 )); done

I let this run overnight last night. The stream9 VM failed after 66 iterations and the f40 VM failed after 102 iterations. Not particularly conclusive either way. However today I noticed that the f40 VM was out of date (notably running kernel-6.8.5-301.fc40.x86_64 whereas latest is kernel-6.11.3-200.fc40.x86_64), so I made sure both hosts were updated (stream9 only had non-meaningful changes) and rebooted into the latest to re-run the tests.

The stream9 VM failed after 57 iterations. As of writing this, the f40 VM is still going at 90 iterations. We'll see how that turns out. Ideally I would prefer to use fedora for reproducing because the perf tooling is more up-to-date and seems better able to make sense of rust debuginfo.

(Edit: f40 made it 102 iterations (coincidentally the same as last time) before it failed with a registry error unrelated to the broken pipe, restarting it...)

(Later edit: ran everything again, stream9 made 53 iterations and f40 76 iterations before both hit the broken pipe)

cgwalters added a commit to cgwalters/containers-image-proxy-rs that referenced this issue Oct 18, 2024
To aid debugging ostreedev/ostree-rs-ext#657

This will be especially handy with containers/skopeo#2439

Signed-off-by: Colin Walters <[email protected]>
@cgwalters
Copy link
Member Author

I put up containers/containers-image-proxy-rs#70

@jeckersb
Copy link
Contributor

I finally have a consistent reproducer for this, using only rpm-ostree (as ostree-container). This is on my laptop, Fedora 40/Silverblue.

Run as root:

# Put the data on tmpfs to minimize I/O delay (thanks @cgwalters for the nudge)
cd /tmp
# Export the image to oci-dir on tmpfs.  If instead read from host containers-storage (or registry),
# the problem no longer reproduces.
podman save --format=oci-dir -o fedora-bootc quay.io/fedora/fedora-bootc:40
# Create a cgroup limited to one cpu and put our shell into it
echo +cpuset >> /sys/fs/cgroup/cgroup.subtree_control
mkdir /sys/fs/cgroup/onecpu
echo 1 > /sys/fs/cgroup/onecpu/cpuset.cpus
echo $$ > /sys/fs/cgroup/onecpu/cgroup.procs
# Try to pull and fail!
ostree --repo=repo init
ostree container image pull --quiet repo ostree-unverified-image:oci:fedora-bootc

@jeckersb
Copy link
Contributor

Some observations about where and how it breaks:

  • It fails most of the time, but still succeeds some (3 times)
  • When it fails, 100% of the time it gets the first 10 layers successfully and fails on the 11th layer (15 times)
  • Sometimes when it fails, it subsequently on retry gets to 35 layers and then fails on the 36th layer (3 times)
# while true; do rm -rf repo; ostree --repo=repo init; while ! ostree container image pull --quiet repo ostree-unverified-image:oci:fedora-bootc; do :; done; done
layers already present: 0; layers needed: 65 (792.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 10; layers needed: 55 (516.2 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => 412021763309f4c35b5780f46f2e83e8ea1c87c0096c098a48d3893f96965a7a
layers already present: 0; layers needed: 65 (792.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 10; layers needed: 55 (516.2 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => 56825b7fb84259b9df2391bb78a85234f0106173e0896b1b246bdad60e55023e
layers already present: 0; layers needed: 65 (792.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 10; layers needed: 55 (516.2 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => a7bde8ef0cd17620b891d1c10ec785c4b6d15c76290019bf53983ea8b904363b
layers already present: 0; layers needed: 65 (792.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 10; layers needed: 55 (516.2 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => 564fe0a5c1836ab3dcf9782076224f2543c2e704c1690cd92139d2a4cb9231a3
layers already present: 0; layers needed: 65 (792.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 10; layers needed: 55 (516.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 35; layers needed: 30 (314.8 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => 2e763e13dc946f604bf62e74f01a1911f2683ffe02138f0f3e23f9f7a3ca2fa6
layers already present: 0; layers needed: 65 (792.2 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => 54867a51c32436ef64db206d3dff13886293329cb4552aaf3a812916b31c37e2
layers already present: 0; layers needed: 65 (792.2 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => be0d8c5e5c4375f14a614f3d2cda28f081c0a806ca1b177d8c0543fd345b002a
layers already present: 0; layers needed: 65 (792.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 10; layers needed: 55 (516.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 35; layers needed: 30 (314.8 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => 1b06be3f952fa3b96ca82a1b50bce3363258d6660e44139faa17088271b46b3a
layers already present: 0; layers needed: 65 (792.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 10; layers needed: 55 (516.2 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => acd22bf7dc4b7233953ee66f182b13e933beed151e70e6f5ef24e733e1e5ea97
layers already present: 0; layers needed: 65 (792.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 10; layers needed: 55 (516.2 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => a5e1497a6dd6ab0aafb68a5ffd1b40034c5341bcd01664dad6cea3c176f6866a
layers already present: 0; layers needed: 65 (792.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 10; layers needed: 55 (516.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 35; layers needed: 30 (314.8 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => 0ad00fee5b3b8180359dbb4587374df69cd947a17abd12f428436e6d26290184
layers already present: 0; layers needed: 65 (792.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 10; layers needed: 55 (516.2 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => 07a5bd2b132990e180feb7f2d2a15b0e858cf74202afce6a7c4d90513152e053
layers already present: 0; layers needed: 65 (792.2 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => 0154b4e6560f5a854bbb709783056e56ed35c2dfd62a49b10506716d447d5e4b
layers already present: 0; layers needed: 65 (792.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 10; layers needed: 55 (516.2 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => 878ab58c217dd06b4333b681d0b3e441e79773fa10e0aa8d1df4365798c160e9
layers already present: 0; layers needed: 65 (792.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 10; layers needed: 55 (516.2 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => f08532b37269a586b03087bf7f13a72cc5f9cba397ea187b3918506081dc6750
layers already present: 0; layers needed: 65 (792.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 10; layers needed: 55 (516.2 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => 6a257a12c1295fd78efc221cff9c3c85a38c25a918152f4c237564542ad9cda5
layers already present: 0; layers needed: 65 (792.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 10; layers needed: 55 (516.2 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => 60cfa84859c9815538cef9da24a621c82c00ab4fc4d45942b7877909f6a7fe71
layers already present: 0; layers needed: 65 (792.2 MB)
error: Importing: Unencapsulating base: Broken pipe (os error 32)
layers already present: 10; layers needed: 55 (516.2 MB)
Wrote: ostree-unverified-image:oci:fedora-bootc => 58c6c7eccda51f2a5d934cf113fef03423ad0e01ca6923414811c900853db914

Worth noting that this is the specific image (didn't realize until now that it was two weeks old):

# podman images a54573503344
REPOSITORY                   TAG         IMAGE ID      CREATED      SIZE
quay.io/fedora/fedora-bootc  40          a54573503344  2 weeks ago  1.61 GB

Layers history for this image:

# podman inspect a54573503344 | jq '.[0].History'
[
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "ostree export of commit 866f8b6c98a344d29794690588613f465375b2d9d1e30696fe8b9022a6526fab"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "python3-botocore-1.35.25-1.fc40.noarch"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "kernel-modules-6.10.12-200.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "podman-5:5.2.3-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "python3-libs-3.12.6-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "kernel-modules-core-6.10.12-200.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "nvidia-gpu-firmware-20240909-1.fc40.noarch"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "linux-firmware-20240909-1.fc40.noarch"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "libicu-74.2-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "rpm-4.19.1.1-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "qemu-user-static-mips-2:8.2.7-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "atheros-firmware-20240909-1.fc40.noarch"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "skopeo-1:1.16.1-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "kernel-core-6.10.12-200.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "binutils-2.41-37.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "amd-gpu-firmware-20240909-1.fc40.noarch"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "samba-client-libs-2:4.20.5-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "systemd-255.12-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "glib2-2.80.3-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "rpm-ostree-2024.8-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "bootc-0.1.16-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "qemu-user-static-ppc-2:8.2.7-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "qemu-user-static-xtensa-2:8.2.7-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "qemu-user-static-aarch64-2:8.2.7-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "mt7xxx-firmware-20240909-1.fc40.noarch"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "systemd-udev-255.12-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "qemu-user-static-sparc-2:8.2.7-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "coreutils-common-9.4-8.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "netavark-2:1.12.2-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "file-libs-5.45-4.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "qemu-user-static-riscv-2:8.2.7-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "qemu-user-static-arm-2:8.2.7-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "brcmfmac-firmware-20240909-1.fc40.noarch"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "NetworkManager-libnm-1:1.46.2-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "gnupg2-2.4.4-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "cracklib-dicts-2.9.11-5.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "qemu-user-static-x86-2:8.2.7-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "171 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "glibc-gconv-extra-2.39-22.fc40.x86_64 and glibc-2.39-22.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "glibc-common-2.39-22.fc40.x86_64 and systemd-pam-255.12-1.fc40.x86_64 and systemd-resolved-255.12-1.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "systemd-libs-255.12-1.fc40.x86_64 and python3-boto3-1.35.25-1.fc40.noarch"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "10 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "12 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "12 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "7 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "12 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "12 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "11 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "10 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "12 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "12 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "12 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "12 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "12 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "12 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "12 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "12 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "gettext-libs-0.22.5-4.fc40.x86_64 and openssh-9.6p1-1.fc40.4.x86_64 and tzdata-2024a-5.fc40.noarch and elfutils-libelf-0.191-4.fc40.x86_64"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "24 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "24 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "13 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "24 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "24 components"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "initramfs (kernel 6.10.12-200.fc40.x86_64) and rpmostree-unpackaged-content"
  },
  {
    "created": "2024-10-07T12:59:18Z",
    "created_by": "Reserved for new packages"
  }
]

@jeckersb
Copy link
Contributor

With the extra debug output from the patches to containers-image-proxy-rs and skopeo that @cgwalters did:

# rm -rf repo; ostree --repo=repo init; ./ostree-container container image pull --quiet repo ostree-unverified-image:oci:fedora-bootc
DEBU[0000] Executing method Initialize
DEBU[0000] Using SQLite blob info cache at /var/lib/containers/cache/blob-info-cache-v1.sqlite
DEBU[0000] Sending reply: err=<nil> value=0.2.6 pipeid=0
DEBU[0000] Executing method OpenImage
DEBU[0000] IsRunningImageAllowed for image oci:/tmp/fedora-bootc
DEBU[0000]  Using default policy section
DEBU[0000]  Requirement 0: allowed
DEBU[0000] Overall: allowed
DEBU[0000] Sending reply: err=<nil> value=1 pipeid=0
DEBU[0000] Executing method GetManifest
DEBU[0000] Sending reply: err=<nil> value=sha256:8811d095de970c03b2f240469deb12c7aebfe06c9fb3c5765ad9fb3edab62474 pipeid=8
DEBU[0000] Executing method FinishPipe
DEBU[0000] Completed pipe goroutine
DEBU[0000] Sending reply: err=<nil> value=<nil> pipeid=0
DEBU[0000] Executing method GetFullConfig
DEBU[0000] Sending reply: err=<nil> value=<nil> pipeid=8
DEBU[0000] Executing method FinishPipe
DEBU[0000] Completed pipe goroutine
DEBU[0000] Sending reply: err=<nil> value=<nil> pipeid=0
layers already present: 0; layers needed: 65 (792.2 MB)
DEBU[0000] Executing method GetLayerInfo
DEBU[0000] Sending reply: err=<nil> value=[{sha256:6b07f0c6e1dcb71dda3ada76f1a9e7b23be4fc4c2a4629707e697d6ad98a285e 2226065 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:0102a1a302cf0ce62e055653488dad07f4f708223e4fa5dd488c7eb81682c466 14046888 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:34c14f4beaf03552e72612800196481b52f0cc90cbdcd88c6b5e2c23fc476728 66074403 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:09d4d924eeede5095cc17f3af6dfb76c9ccaaeb5eb00afb48066b01cdf5baebb 20479739 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:d694a9d0df87d565f4fb3043b17e9bc80ed7ac968e5fadaf3f83aacdb1defe1d 17445305 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:43f59a48f81e0dee3bb40d506f01d48f8b500384f2057d21bd9a8655fdc50ca6 39748491 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:bdea91dcfd51f0ecbbe2329698625e53c562033b4aee4bca399bcbd9b66dd4cf 39839588 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:344218d6d5bbd7701300d27258cc78ba484a1b963643c83c8f9e7d4df65a115a 39372856 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:b8e6bbdc819b26f4a67329a09eac4e2ef6ca93985cf0c08b19b52974b5291966 15882737 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:fa5c93800c2c174f8f540e0ea152166aec87b3632342e509e16ea84e38b4e1b0 11459987 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:e2d50822645c5856aa1e95979675e0fa56b61ed5741d39529be3246626dbab09 11720814 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:bdfd5348105da108fdad98422e2b7296429ef445d1517d398f764030263de4ba 29434896 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:123230ae22f5536ca41427019dbdd6e1ab6859629dbb6d90aa321c9c4d9c1c0b 11175324 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:fa4936627bd2b60e6ca70acd3955ce3cc911d6f9c6dbaf31be772d4835ed34b7 17774026 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:081ee7f3a84d14dc02c95f866ef4f3cde09d0468d812fa17303703460029d288 9390995 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:58ffe337d05aff160a5cadb6bdd9317ddf941b461c7be8a6a884f762cdf6dd79 22595288 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:1ac1819d205fa91bfb299641f9e556add1f2aac6b0901ea328abe5387033d0d7 7744717 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:b267dc19c59b52fd931d1a95a26e9ddfea1d654582cb4ee8aba9d1901daa3947 7140703 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:a3df3d61071f13a13e85a64fa4428e344dbb6b67581e9ded74a322cf65d7e6eb 5433968 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:22b1382d65ddacc185034c9e105a1fd398cb71288dead4c60620d563521ad90e 6475938 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:0d6793ee88fc85111d41b36445e0096ce08d23c590af5802d4dd13cd94b43cb5 6399193 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:cec6a7c0b849414fc148693a58add6f53eb90402e231c9f66260ae3205bff22d 6055065 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:ae5dbd1a7e0f920dc0a64f8283f788b72b8e01b667725dfa0c610e1fe7a0fee0 4633556 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:ccedc5eecb0d37dd54f5c7578b7cd16e92cd6634ef6a57bb6756789f3cbbe166 4995680 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:c73d9013fc92429394f832781a645c4e17701e044c297a56415e781d8b98479f 12674417 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:b62402d70b9e078679729d8b41d4b2904119c2aba1569128eb2054381a425112 3094406 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:6893418a58cfe1ef2ce7a49c7291ce400a80af2d8ac9b99af9194c7e6e98a985 5199972 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:f2233f0423a9618e57512950c60a90ef93ae0ebbc298083ee750ce7e44d4d7f5 4337280 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:36c7f9243bf6e1e21fb6a86d68611a928ae70ec0e444a04376b12fe408ef0d20 5034877 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:7d3ecd4dc0ee01b97f265caad6104055c23a0fef33be70e424084d3a80045ac5 1057960 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:10e6b76a040023049a59e113c1ea59d5b7c708372f7173da3010c8952929b52d 3881441 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:ddab368cc6a733b7ba9fc056a9b4e210297466d1435b51cd11e0ee749c907ee7 4222472 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:a05d958af586b3b1de5df7f75028418a604b8520f1172009468bc9f91721b2dd 10083385 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:fd4e1028991240eac38eb3d48f76b1b60132701521f7eec4f6e79744f1d245a5 3298450 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:aa792df6c97e5601431815bed0b35cc62019f50fe2e229ddf2738fd3fb97199e 4350223 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:e43ae6ecde822340c0f805465297049689733a6f626939dde4c1e0c4c7cde770 4874773 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:d53375c151adedf272adaa70a4512ba80ca73af7d976dc680e2d8a40aca6ecf9 3845718 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:cc8ece7b495df6c30731f7ad801606644e2f22f2dfb413948bcdcb3327fc3726 5030147 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:e1074faa767bd571e5c1b8b37127760c45c5ec1cceb3448a8315c2f5ac9ccead 6747490 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:585119d5ece74ad0facdefa8398b82f208c7df3b6625724f188331b9b9abedac 1271739 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:636a292020bbda464dd7c9762c80fc3db62dc7bed99dc64df93f1d9f27d9cf8b 1459488 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:7defe8475e060fd32ce57e70cd34015b2901e9c837064da90dc2129d7b64bcc9 37441757 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:dfb9640285c75ec73a2ae97ac8287820b01f69c7b9fc970e353fd20cf5acbdad 4561078 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:4be30725b843b2b7a753e377594ed1ef22c3943e1dce4eba728a54ff1be5b54f 1861698 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:3ba001ee06945ef598bf40b7e10301091ce6425b96cf67841fadb7c91138a3a8 1305570 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:7ac473f02864567211ce31fa3a0bf32e99e788799652acce0db76dec97fad09c 22997251 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:e0d5b9fb8284d99e3dea9d2ac272dfbdbc5d47a2ee1c54e72faf97d1745d1a08 13148313 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:cf494e076214ff56027c3026a88a3073eb690ee2c6b9041112a2249cbb90dbb2 17105232 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:35839841e4a7953374088d0ee15fefb516517ed475be2424deeffeafcd1a0977 28778047 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:c92eb7206354ecc438025b064092957b0570d3291e3f48afe0a6a77252e5cac8 4115740 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:3441e7098e1bde94ea36b0e80db0680a8b6d2f54b557d26d9de5b85040b6193d 4101066 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:b911caa20af7c6d103f0591f30ac6797dfd1479b262c307361d867cc72333804 1981762 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:b33d9f4cce97efbb394bf0392334f28083644d75cc869ba576e1b8b9fb4c43a5 2571472 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:bb6425f3e7a1b5086798fb14b0fe00c954a1588d031408801b16c511bca2c95a 3529276 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:8f0c401b2cc0a0dae538f90a5cceaf644584d3de104df2525a4170472c9b8343 1507076 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:36d9f12e12422fd1d65d7eb9ef85e5c620a24e2d4462984932a6e20ef7a0fa7d 3862508 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:fcf740ad1a6a4fa07962e17bf3e01f1fe59c30dfa5f5ebce1f0c1da460add209 3099825 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:66fc9285a7e7567b3b4abba90cfc5fea70f2fb171d27d3243ca2b9389b3c8c15 2548681 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:78d72caed6f491a5b0fcadf329dbae6b6bd29250614d1cb167ecee3f2e5caf62 4198913 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:56d06b478d901bbc7750868c67ba6d67187a2e9bb2f50dedbb257c4ffdee5f72 5202105 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:ee1127bb2b8bfe7f3bf92ade608695ea6bb2b54e5eb13bb59ec2bef066f73b4c 2209600 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:5e2efbb088d9115589904cc15f268bdd363f604ce2167b38fdf06c537a514222 25830340 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:735765d51d335492d7cd7993e571cc6eec39c15660ebe7341bd373f06be20a3a 23696856 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:735e724736a6c5cc5eeb23d8458ca9b5c9f9852f3434edf082867ce28a114d7c 78570158 application/vnd.oci.image.layer.v1.tar+gzip} {sha256:ad312c5c40ccd18a3c639cc139211f3c4284e568b69b2e748027cee057986fe0 2331 application/vnd.oci.image.layer.v1.tar+gzip}] pipeid=0
DEBU[0000] Executing method GetBlob
DEBU[0000] Sending reply: err=<nil> value=14046888 pipeid=9
DEBU[0000] Executing method FinishPipe
DEBU[0000] Completed pipe goroutine
DEBU[0000] Sending reply: err=<nil> value=<nil> pipeid=0
DEBU[0000] Executing method GetBlob
DEBU[0000] Sending reply: err=<nil> value=66074403 pipeid=9
DEBU[0000] Executing method FinishPipe
DEBU[0000] Completed pipe goroutine
DEBU[0000] Sending reply: err=<nil> value=<nil> pipeid=0
DEBU[0000] Executing method GetBlob
DEBU[0000] Sending reply: err=<nil> value=20479739 pipeid=9
DEBU[0000] Executing method FinishPipe
DEBU[0001] Completed pipe goroutine
DEBU[0001] Sending reply: err=<nil> value=<nil> pipeid=0
DEBU[0001] Executing method GetBlob
DEBU[0001] Sending reply: err=<nil> value=17445305 pipeid=9
DEBU[0001] Executing method FinishPipe
DEBU[0001] Completed pipe goroutine
DEBU[0001] Sending reply: err=<nil> value=<nil> pipeid=0
DEBU[0001] Executing method GetBlob
DEBU[0001] Sending reply: err=<nil> value=39748491 pipeid=9
DEBU[0001] Executing method FinishPipe
DEBU[0001] Completed pipe goroutine
DEBU[0001] Sending reply: err=<nil> value=<nil> pipeid=0
DEBU[0001] Executing method GetBlob
DEBU[0001] Sending reply: err=<nil> value=39839588 pipeid=9
DEBU[0001] Executing method FinishPipe
DEBU[0001] Completed pipe goroutine
DEBU[0001] Sending reply: err=<nil> value=<nil> pipeid=0
DEBU[0001] Executing method GetBlob
DEBU[0001] Sending reply: err=<nil> value=39372856 pipeid=9
DEBU[0001] Executing method FinishPipe
DEBU[0002] Completed pipe goroutine
DEBU[0002] Sending reply: err=<nil> value=<nil> pipeid=0
DEBU[0002] Executing method GetBlob
DEBU[0002] Sending reply: err=<nil> value=15882737 pipeid=9
DEBU[0002] Executing method FinishPipe
DEBU[0002] Completed pipe goroutine
DEBU[0002] Sending reply: err=<nil> value=<nil> pipeid=0
DEBU[0002] Executing method GetBlob
DEBU[0002] Sending reply: err=<nil> value=11459987 pipeid=9
DEBU[0002] Executing method FinishPipe
DEBU[0002] Completed pipe goroutine
DEBU[0002] Sending reply: err=<nil> value=<nil> pipeid=0
DEBU[0002] Executing method GetBlob
DEBU[0002] Sending reply: err=<nil> value=11720814 pipeid=9
DEBU[0002] Executing method FinishPipe
DEBU[0002] Completed pipe goroutine
DEBU[0002] Sending reply: err=<nil> value=<nil> pipeid=0
error: Importing: Unencapsulating base: Broken pipe (os error 32)

@jeckersb
Copy link
Contributor

Does this fix it?

Nope still fails like that (I had to modify your patch slightly with only one .into_inner() otherwise it angers the compiler).

@jeckersb
Copy link
Contributor

More details on the timing here!

Running like so...

perf record --call-graph=dwarf -e probe:pipe_release -e raw_syscalls:sys_exit --filter "ret==-32" ostree container image pull --quiet repo ostree-unverified-image:oci:fedora-bootc

gets us...

tokio-runtime-w    2640 [001]   576.753631:    probe:pipe_release: (ffffffffa54b69c4)
        ffffffffa54b69c5 pipe_release+0x5 ([kernel.kallsyms])
        ffffffffa54aa77e __fput+0xee ([kernel.kallsyms])
        ffffffffa54a4cbc __x64_sys_close+0x3c ([kernel.kallsyms])
        ffffffffa61b2952 do_syscall_64+0x82 ([kernel.kallsyms])
        ffffffffa620012f entry_SYSCALL_64+0xaf ([kernel.kallsyms])
            7f487d7f0f9c __libc_close+0x3c (inlined)
            565256eca225 core::ptr::drop_in_place<tokio::io::util::buf_reader::BufReader<tokio::net::unix::pipe::Receiver>>+0x1a5 (/usr/bin/rpm-ostree)
            565256ee3a3f tokio::runtime::task::raw::poll+0xe3f (/usr/bin/rpm-ostree)
            56525770f44c std::sys_common::backtrace::__rust_begin_short_backtrace+0x19c (/usr/bin/rpm-ostree)
            565257710570 core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h84eabe4fbbfa1fec+0x70 (/usr/bin/rpm-ostree)
            56525777bcda std::sys::pal::unix::thread::Thread::new::thread_start+0x2a (/usr/bin/rpm-ostree)
            7f487d77f6d6 start_thread+0x386 (/usr/lib64/libc.so.6)
            7f487d80360b clone3+0x2b (/usr/lib64/libc.so.6)

tokio-runtime-w    2638 [001]   576.753655: raw_syscalls:sys_exit: NR 1 = -32
        ffffffffa51c71c2 syscall_exit_to_user_mode_prepare+0x112 ([kernel.kallsyms])
        ffffffffa51c71c2 syscall_exit_to_user_mode_prepare+0x112 ([kernel.kallsyms])
        ffffffffa61ba220 syscall_exit_to_user_mode+0x10 ([kernel.kallsyms])
        ffffffffa61b295e do_syscall_64+0x8e ([kernel.kallsyms])
        ffffffffa620012f entry_SYSCALL_64+0xaf ([kernel.kallsyms])
            7f487d7f69bd __GI___libc_write+0x4d (/usr/lib64/libc.so.6)
            7f487d7f69bd __GI___libc_write+0x4d (/usr/lib64/libc.so.6)
            565257768d5c <std::os::unix::net::stream::UnixStream as std::io::Write>::write+0x1c (/usr/bin/rpm-ostree)
            565256eb45a5 std::io::Write::write_all+0x75 (/usr/bin/rpm-ostree)
            565256eb3d18 std::io::copy::generic_copy+0x198 (/usr/bin/rpm-ostree)

Which shows where the tokio::io::util::buf_reader::BufReader<tokio::net::unix::pipe::Receiver> gets dropped (closing the rx end), and then after the write to the tx end failing with -32.

@cgwalters
Copy link
Member Author

I have a theory that there's a different underlying error that's being masked here.

@cgwalters
Copy link
Member Author

Can you try with this?

diff --git a/lib/src/container/store.rs b/lib/src/container/store.rs
index a275c0af..f908f65f 100644
--- a/lib/src/container/store.rs
+++ b/lib/src/container/store.rs
@@ -744,7 +744,10 @@ impl ImageImporter {
                     txn.commit(Some(cancellable))?;
                     Ok::<_, anyhow::Error>(commit)
                 })
-                .map_err(|e| e.context(format!("Layer {}", layer.layer.digest())));
+                .map_err(|e| e.context(format!("Layer {}", layer.layer.digest())))
+                .inspect_err(|e| {
+                    dbg!(e);
+                });
             let commit = super::unencapsulate::join_fetch(import_task, driver).await?;
             layer.commit = commit;
             if let Some(p) = self.layer_progress.as_ref() {

@jeckersb
Copy link
Contributor

Can you try with this?

No difference, still just fails with:
error: Importing: Unencapsulating base: Copying for decompression: Broken pipe (os error 32)

@jeckersb
Copy link
Contributor

jeckersb commented Oct 22, 2024

Run under gdb, Thread 3 is where we get SIGPIPE, Thread 2 is where the dedicated blocking thread is executing. (Edit, looks like both have blocking bits actually.)

gdb.txt

@cgwalters
Copy link
Member Author

Can you capture in gdb the point where we call close on the reader fd?

@jeckersb
Copy link
Contributor

Should be here. Note that once I add a breakpoint on drop and have it dump stack it slows things down enough that the problem no longer reproduces. But I suspect the drop path is probably the same whether or not the race hits or not?

Thread 11 "tokio-runtime-w" hit Breakpoint 1.1, core::ptr::drop_in_place<tokio::io::util::buf_reader::BufReader<tokio::net::unix::pipe::Receiver>> () at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ptr/mod.rs:574
574     in /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ptr/mod.rs
#0  core::ptr::drop_in_place<tokio::io::util::buf_reader::BufReader<tokio::net::unix::pipe::Receiver>> () at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ptr/mod.rs:574
#1  0x000055555583d705 in core::ptr::drop_in_place<alloc::boxed::Box<(dyn tokio::io::async_buf_read::AsyncBufRead + core::marker::Send + core::marker::Unpin), alloc::alloc::Global>> ()
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ptr/mod.rs:574
#2  core::ptr::drop_in_place<tokio_util::io::sync_bridge::SyncIoBridge<alloc::boxed::Box<(dyn tokio::io::async_buf_read::AsyncBufRead + core::marker::Send + core::marker::Unpin), alloc::alloc::Global>>> ()
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ptr/mod.rs:574
#3  core::ptr::drop_in_place<core::cell::UnsafeCell<tokio_util::io::sync_bridge::SyncIoBridge<alloc::boxed::Box<(dyn tokio::io::async_buf_read::AsyncBufRead + core::marker::Send + core::marker::Unpin), alloc::alloc::Global>>>> ()
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ptr/mod.rs:574
#4  core::ptr::drop_in_place<core::cell::RefCell<tokio_util::io::sync_bridge::SyncIoBridge<alloc::boxed::Box<(dyn tokio::io::async_buf_read::AsyncBufRead + core::marker::Send + core::marker::Unpin), alloc::alloc::Global>>>> ()
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ptr/mod.rs:574
#5  core::ptr::drop_in_place<tar::archive::ArchiveInner<tokio_util::io::sync_bridge::SyncIoBridge<alloc::boxed::Box<(dyn tokio::io::async_buf_read::AsyncBufRead + core::marker::Send + core::marker::Unpin), alloc::alloc::Global>>>> ()
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ptr/mod.rs:574
#6  core::ptr::drop_in_place<tar::archive::Archive<tokio_util::io::sync_bridge::SyncIoBridge<alloc::boxed::Box<(dyn tokio::io::async_buf_read::AsyncBufRead + core::marker::Send + core::marker::Unpin), alloc::alloc::Global>>>> ()
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ptr/mod.rs:574
#7  ostree_ext::container::store::{impl#5}::unencapsulate_base::{async_fn#0}::{async_block#0}::{closure#3} (cancellable=0x7fffef1ff398) at /var/home/jeckersb/git/ostree-rs-ext/lib/src/container/store.rs:793
#8  ostree_ext::tokio_util::spawn_blocking_cancellable::{closure#0}<ostree_ext::container::store::{impl#5}::unencapsulate_base::{async_fn#0}::{async_block#0}::{closure_env#3}, core::result::Result<alloc::string::String, anyhow::Error>> ()
    at /var/home/jeckersb/git/ostree-rs-ext/lib/src/tokio_util.rs:51
#9  tokio::runtime::blocking::task::{impl#2}::poll<ostree_ext::tokio_util::spawn_blocking_cancellable::{closure_env#0}<ostree_ext::container::store::{impl#5}::unencapsulate_base::{async_fn#0}::{async_block#0}::{closure_env#3}, core::result::Result<alloc::s
tring::String, anyhow::Error>>, core::result::Result<alloc::string::String, anyhow::Error>> (self=..., _cx=<optimized out>) at /var/home/jeckersb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/blocking/task.rs:42
#10 0x000055555580031d in tokio::runtime::task::core::{impl#6}::poll::{closure#0}<tokio::runtime::blocking::task::BlockingTask<ostree_ext::tokio_util::spawn_blocking_cancellable::{closure_env#0}<ostree_ext::container::store::{impl#5}::unencapsulate_base::{
async_fn#0}::{async_block#0}::{closure_env#3}, core::result::Result<alloc::string::String, anyhow::Error>>>, tokio::runtime::blocking::schedule::BlockingSchedule> (ptr=0x5555566d6b38)
    at /var/home/jeckersb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/core.rs:331
#11 tokio::loom::std::unsafe_cell::UnsafeCell<tokio::runtime::task::core::Stage<tokio::runtime::blocking::task::BlockingTask<ostree_ext::tokio_util::spawn_blocking_cancellable::{closure_env#0}<ostree_ext::container::store::{impl#5}::unencapsulate_base::{as
ync_fn#0}::{async_block#0}::{closure_env#3}, core::result::Result<alloc::string::String, anyhow::Error>>>>>::with_mut<tokio::runtime::task::core::Stage<tokio::runtime::blocking::task::BlockingTask<ostree_ext::tokio_util::spawn_blocking_cancellable::{closur
e_env#0}<ostree_ext::container::store::{impl#5}::unencapsulate_base::{async_fn#0}::{async_block#0}::{closure_env#3}, core::result::Result<alloc::string::String, anyhow::Error>>>>, core::task::poll::Poll<core::result::Result<alloc::string::String, anyhow::E
rror>>, tokio::runtime::task::core::{impl#6}::poll::{closure_env#0}<tokio::runtime::blocking::task::BlockingTask<ostree_ext::tokio_util::spawn_blocking_cancellable::{closure_env#0}<ostree_ext::container::store::{impl#5}::unencapsulate_base::{async_fn#0}::{
async_block#0}::{closure_env#3}, core::result::Result<alloc::string::String, anyhow::Error>>>, tokio::runtime::blocking::schedule::BlockingSchedule>> (self=0x5555566d6b38, f=...)
    at /var/home/jeckersb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/loom/std/unsafe_cell.rs:16
#12 tokio::runtime::task::core::Core<tokio::runtime::blocking::task::BlockingTask<ostree_ext::tokio_util::spawn_blocking_cancellable::{closure_env#0}<ostree_ext::container::store::{impl#5}::unencapsulate_base::{async_fn#0}::{async_block#0}::{closure_env#3}
, core::result::Result<alloc::string::String, anyhow::Error>>>, tokio::runtime::blocking::schedule::BlockingSchedule>::poll<tokio::runtime::blocking::task::BlockingTask<ostree_ext::tokio_util::spawn_blocking_cancellable::{closure_env#0}<ostree_ext::contain
er::store::{impl#5}::unencapsulate_base::{async_fn#0}::{async_block#0}::{closure_env#3}, core::result::Result<alloc::string::String, anyhow::Error>>>, tokio::runtime::blocking::schedule::BlockingSchedule> (self=0x5555566d6b20, cx=...)
    at /var/home/jeckersb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/core.rs:320
#13 0x00005555558307d5 in tokio::runtime::task::harness::poll_future::{closure#0}<tokio::runtime::blocking::task::BlockingTask<ostree_ext::tokio_util::spawn_blocking_cancellable::{closure_env#0}<ostree_ext::container::store::{impl#5}::unencapsulate_base::{
async_fn#0}::{async_block#0}::{closure_env#3}, core::result::Result<alloc::string::String, anyhow::Error>>>, tokio::runtime::blocking::schedule::BlockingSchedule> ()
    at /var/home/jeckersb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/harness.rs:500
#14 core::panic::unwind_safe::{impl#23}::call_once<core::task::poll::Poll<core::result::Result<alloc::string::String, anyhow::Error>>, tokio::runtime::task::harness::poll_future::{closure_env#0}<tokio::runtime::blocking::task::BlockingTask<ostree_ext::toki
o_util::spawn_blocking_cancellable::{closure_env#0}<ostree_ext::container::store::{impl#5}::unencapsulate_base::{async_fn#0}::{async_block#0}::{closure_env#3}, core::result::Result<alloc::string::String, anyhow::Error>>>, tokio::runtime::blocking::schedule
::BlockingSchedule>> (self=...) at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/panic/unwind_safe.rs:272
#15 std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tokio::runtime::blocking::task::BlockingTask<ostree_ext::tokio_util::spawn_blocking_cancellable::{closure_env#0}<ostree_
ext::container::store::{impl#5}::unencapsulate_base::{async_fn#0}::{async_block#0}::{closure_env#3}, core::result::Result<alloc::string::String, anyhow::Error>>>, tokio::runtime::blocking::schedule::BlockingSchedule>>, core::task::poll::Poll<core::result::
Result<alloc::string::String, anyhow::Error>>> (data=<error reading variable: Cannot access memory at address 0x0>) at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:554
#16 std::panicking::try<core::task::poll::Poll<core::result::Result<alloc::string::String, anyhow::Error>>, core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tokio::runtime::blocking::task::BlockingTask<
ostree_ext::tokio_util::spawn_blocking_cancellable::{closure_env#0}<ostree_ext::container::store::{impl#5}::unencapsulate_base::{async_fn#0}::{async_block#0}::{closure_env#3}, core::result::Result<alloc::string::String, anyhow::Error>>>, tokio::runtime::bl
ocking::schedule::BlockingSchedule>>> (f=...) at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:518
#17 std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tokio::runtime::blocking::task::BlockingTask<ostree_ext::tokio_util::spawn_blocking_cancellable::{closure_env#0}<ostree_ext:
:container::store::{impl#5}::unencapsulate_base::{async_fn#0}::{async_block#0}::{closure_env#3}, core::result::Result<alloc::string::String, anyhow::Error>>>, tokio::runtime::blocking::schedule::BlockingSchedule>>, core::task::poll::Poll<core::result::Resu
lt<alloc::string::String, anyhow::Error>>> (f=...) at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panic.rs:345
#18 tokio::runtime::task::harness::poll_future<tokio::runtime::blocking::task::BlockingTask<ostree_ext::tokio_util::spawn_blocking_cancellable::{closure_env#0}<ostree_ext::container::store::{impl#5}::unencapsulate_base::{async_fn#0}::{async_block#0}::{clos
ure_env#3}, core::result::Result<alloc::string::String, anyhow::Error>>>, tokio::runtime::blocking::schedule::BlockingSchedule> (core=0x5555566d6b20, cx=...)
    at /var/home/jeckersb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/harness.rs:488
#19 tokio::runtime::task::harness::Harness<tokio::runtime::blocking::task::BlockingTask<ostree_ext::tokio_util::spawn_blocking_cancellable::{closure_env#0}<ostree_ext::container::store::{impl#5}::unencapsulate_base::{async_fn#0}::{async_block#0}::{closure_
env#3}, core::result::Result<alloc::string::String, anyhow::Error>>>, tokio::runtime::blocking::schedule::BlockingSchedule>::poll_inner<tokio::runtime::blocking::task::BlockingTask<ostree_ext::tokio_util::spawn_blocking_cancellable::{closure_env#0}<ostree_
ext::container::store::{impl#5}::unencapsulate_base::{async_fn#0}::{async_block#0}::{closure_env#3}, core::result::Result<alloc::string::String, anyhow::Error>>>, tokio::runtime::blocking::schedule::BlockingSchedule> (self=<optimized out>)
    at /var/home/jeckersb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/harness.rs:209
#20 tokio::runtime::task::harness::Harness<tokio::runtime::blocking::task::BlockingTask<ostree_ext::tokio_util::spawn_blocking_cancellable::{closure_env#0}<ostree_ext::container::store::{impl#5}::unencapsulate_base::{async_fn#0}::{async_block#0}::{closure_
env#3}, core::result::Result<alloc::string::String, anyhow::Error>>>, tokio::runtime::blocking::schedule::BlockingSchedule>::poll<tokio::runtime::blocking::task::BlockingTask<ostree_ext::tokio_util::spawn_blocking_cancellable::{closure_env#0}<ostree_ext::c
ontainer::store::{impl#5}::unencapsulate_base::{async_fn#0}::{async_block#0}::{closure_env#3}, core::result::Result<alloc::string::String, anyhow::Error>>>, tokio::runtime::blocking::schedule::BlockingSchedule> (self=...)
    at /var/home/jeckersb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/task/harness.rs:154
#21 0x000055555618c8df in tokio::runtime::task::raw::RawTask::poll (self=...) at src/runtime/task/raw.rs:201
#22 tokio::runtime::task::UnownedTask<tokio::runtime::blocking::schedule::BlockingSchedule>::run<tokio::runtime::blocking::schedule::BlockingSchedule> (self=...) at src/runtime/task/mod.rs:473
#23 tokio::runtime::blocking::pool::Task::run (self=...) at src/runtime/blocking/pool.rs:160
#24 tokio::runtime::blocking::pool::Inner::run (self=0x5555566bbe90, worker_thread_id=9) at src/runtime/blocking/pool.rs:518
#25 0x000055555619e0c9 in tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure#0} () at src/runtime/blocking/pool.rs:476
#26 std::sys::backtrace::__rust_begin_short_backtrace<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()> (f=...) at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/sys/backtrace.rs:154
#27 0x000055555619307c in std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure#0}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()> ()
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/thread/mod.rs:522
#28 core::panic::unwind_safe::{impl#23}::call_once<(), std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>> (
    self=<error reading variable: Cannot access memory at address 0x0>) at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/panic/unwind_safe.rs:272
#29 std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>>, ()> (
    data=<error reading variable: Cannot access memory at address 0x0>) at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:554
#30 std::panicking::try<(), core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>>> (
    f=<error reading variable: Cannot access memory at address 0x0>) at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:518
#31 std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>>, ()> (
    f=<error reading variable: Cannot access memory at address 0x0>) at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panic.rs:345
#32 std::thread::{impl#0}::spawn_unchecked_::{closure#1}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()> () at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/thread/mod.rs:521
#33 core::ops::function::FnOnce::call_once<std::thread::{impl#0}::spawn_unchecked_::{closure_env#1}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>, ()> ()
    at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ops/function.rs:250
#34 0x00005555561f66eb in alloc::boxed::{impl#48}::call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> () at alloc/src/boxed.rs:2231
#35 alloc::boxed::{impl#48}::call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> () at alloc/src/boxed.rs:2231
#36 std::sys::pal::unix::thread::{impl#2}::new::thread_start () at std/src/sys/pal/unix/thread.rs:105
#37 0x00007ffff697f6d7 in start_thread (arg=<optimized out>) at pthread_create.c:447
#38 0x00007ffff6a0360c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

@cgwalters
Copy link
Member Author

Ohh right in current code it's the tar stream that owns the reader, does this help?

diff --git a/lib/src/container/store.rs b/lib/src/container/store.rs
index a275c0af..0ee99de0 100644
--- a/lib/src/container/store.rs
+++ b/lib/src/container/store.rs
@@ -730,8 +730,8 @@ impl ImageImporter {
                 crate::tokio_util::spawn_blocking_cancellable_flatten(move |cancellable| {
                     let txn = repo.auto_transaction(Some(cancellable))?;
                     let mut importer = crate::tar::Importer::new_for_object_set(&repo);
-                    let blob = tokio_util::io::SyncIoBridge::new(blob);
-                    let mut archive = tar::Archive::new(blob);
+                    let mut blob = tokio_util::io::SyncIoBridge::new(blob);
+                    let mut archive = tar::Archive::new(&mut blob);
                     importer.import_objects(&mut archive, Some(cancellable))?;
                     let commit = if write_refs {
                         let commit = importer.finish_import_object_set()?;
@@ -742,6 +742,7 @@ impl ImageImporter {
                         None
                     };
                     txn.commit(Some(cancellable))?;
+                    drop(blob);
                     Ok::<_, anyhow::Error>(commit)
                 })
                 .map_err(|e| e.context(format!("Layer {}", layer.layer.digest())));

@jeckersb
Copy link
Contributor

Does not help.

Another thing worth noting is that I tried forcing tokio, tokio-util, and tokio-stream to the latest versions, just to make sure there wasn't some random upstream fix there but that also did not help.

@cgwalters
Copy link
Member Author

For reference instead of directly writing to cgroups I was able to reproduce with taskset -c 0 which is easier.

So a core loop would look like this:

# ostree --repo=repo refs --delete ostree/container; taskset -c 0 ostree container image pull --quiet repo ostree-unverified-image:oci:fedora-bootc

The first command flushes the refs for the layers in the existing repo.

cgwalters added a commit to cgwalters/ostree-rs-ext that referenced this issue Oct 22, 2024
This took a crazy long time to debug but after lots of false
starts I think this is right. Basically what's going on is
we have async tasks that are talking over a `pipe()` inside
our own process.

We must not close the read side of the pipe until the
writer is done.

I believe this is dependent on tokio task scheduling order,
and it's way easier to reproduce when pinned to a single CPU.

Closes: ostreedev#657
Signed-off-by: Colin Walters <[email protected]>
@cgwalters
Copy link
Member Author

@jeckersb I think my original fix was actually right...but notice that we have two copies of this code for bad reasons and I accidentally changed only the second one (which handles derived/non-ostree layers) which weren't relevant here.

#672 seems to work for me...can you try it and review?

cgwalters added a commit to cgwalters/ostree-rs-ext that referenced this issue Oct 23, 2024
This took a crazy long time to debug but after lots of false
starts I think this is right. Basically what's going on is
we have async tasks that are talking over a `pipe()` inside
our own process.

We must not close the read side of the pipe until the
writer is done.

I believe this is dependent on tokio task scheduling order,
and it's way easier to reproduce when pinned to a single CPU.

Closes: ostreedev#657
Signed-off-by: Colin Walters <[email protected]>
@cgwalters
Copy link
Member Author

OK the thing I'm thinking we should try now:

  • Change fetch_layer_decompress to just fetch_layer i.e. don't decompress there
  • Have it return the media type
  • Then in the places where we're making worker threads (see spawn_blocking_cancellable_flatten we basically do
diff --git a/lib/src/container/store.rs b/lib/src/container/store.rs
index a275c0a..fe16625 100644
--- a/lib/src/container/store.rs
+++ b/lib/src/container/store.rs
@@ -731,6 +731,7 @@ impl ImageImporter {
                     let txn = repo.auto_transaction(Some(cancellable))?;
                     let mut importer = crate::tar::Importer::new_for_object_set(&repo);
                     let blob = tokio_util::io::SyncIoBridge::new(blob);
+                    // Create decompressor here based on media type
                     let mut archive = tar::Archive::new(blob);
                     importer.import_objects(&mut archive, Some(cancellable))?;
                     let commit = if write_refs {

(There and in the other place too)

@jeckersb
Copy link
Contributor

I have a tentative patch that moves the decompressor bits inline, needs a bit of cleanup just for readability-sake and then I will post a draft.

I think there's a bug either in my new implementation, or my new implementation is uncovering some pre-existing bug.

On current main, when running the tests I occasionally get:

---- test_container_var_content stdout ----
thread 'test_container_var_content' panicked at lib/tests/it/main.rs:1070:41:
called `Result::unwrap()` on an `Err` value: Importing

Caused by:
    0: Parsing layer blob sha256:d461c2d98f69ff4ea32e8e352eb852a6d40567a4d5f5e46496e0400c2378c7d6
    1: error: opening repo: opendir(/proc/self/fd/3): No such file or directory
    2: Processing tar
    3: Failed to commit tar: ExitStatus(unix_wait_status(256))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

With my patch, I still occasionally get that same error. However, I also now occasionally get this error as well:

---- test_container_import_export_v1 stdout ----
$ ostree --repo=dest/repo remote gpg-import --stdin myremote
thread 'test_container_import_export_v1' panicked at lib/tests/it/main.rs:545:10:
called `Result::unwrap()` on an `Err` value: exporting to ociarchive

Caused by:
    0: Skopeo copy
    1: skopeo failed: time="2024-10-30T14:37:38-04:00" level=fatal msg="initializing source oci:/proc/self/fd/3:: open /proc/self/fd/3/index.json: no such file or directory"


note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Which is similar but distinct from the first. Will need to investigate a bit more.

@cgwalters
Copy link
Member Author

Yeah we've always had this bug sadly #664
I've just been clicking retry

@jeckersb
Copy link
Contributor

I guess I should dig more into #664 as part of this then. I'd like to understand better what's going on there so I can at least be somewhat confident I'm not making things worse.

jeckersb added a commit to jeckersb/ostree-rs-ext that referenced this issue Oct 31, 2024
This (with cautious optimism) appears to fix ostreedev#657.

Signed-off-by: John Eckersberg <[email protected]>
@jeckersb
Copy link
Contributor

I realized that main exhibits both of the previous errors, so it's nothing to do with my proposed fix in #677. So I think this might finally be fixed and I can look into #664 independently after this.

jeckersb added a commit to jeckersb/ostree-rs-ext that referenced this issue Oct 31, 2024
This (with cautious optimism) appears to fix ostreedev#657.

Signed-off-by: John Eckersberg <[email protected]>
@ggiguash
Copy link

ggiguash commented Nov 1, 2024

@jeckersb , can you explain if the current fix is supposed to resolve this error, or we also need to wait for #664?

@jeckersb
Copy link
Contributor

jeckersb commented Nov 4, 2024

@jeckersb , can you explain if the current fix is supposed to resolve this error, or we also need to wait for #664?

This should be sufficient to fix that error. #664 is similar, but is a separate problem.

@ggiguash
Copy link

ggiguash commented Nov 5, 2024

@jeckersb, when do we expect this fix merged in CentOS 9 Stream?
We are getting more errors of this nature very frequently (.e.g, this and this log)

@jeckersb
Copy link
Contributor

jeckersb commented Nov 5, 2024

@jeckersb, when do we expect this fix merged in CentOS 9 Stream? We are getting more errors of this nature very frequently (.e.g, this and this log)

Hopefully soon! We just released 1.1.2 so now it's just a matter of getting that pushed through the centos stream processes. @jmarrero I'd like to run through the centos release process with you so I can take some of the burden off of you 😄

@jeckersb
Copy link
Contributor

jeckersb commented Nov 5, 2024

Actually for those two specific errors that you posted, it looks like the error happened in ostree-container from rpm-ostree so we need to make sure that's updated in c9s as well.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants