Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@std/tar: untar_stream fails to extract PAX archives #6083

Closed
johnstonmatt opened this issue Oct 1, 2024 · 12 comments · Fixed by #6199
Closed

@std/tar: untar_stream fails to extract PAX archives #6083

johnstonmatt opened this issue Oct 1, 2024 · 12 comments · Fixed by #6199
Labels
bug Something isn't working

Comments

@johnstonmatt
Copy link

The archive is https://github.com/polyseam/cndi/releases/download/v2.20.0/cndi-mac.tar.gz

and it is generated with deno compile:
ie: deno task build
https://github.com/polyseam/cndi/blob/747d14b16779e761c05d80917c8fc141cdae259b/deno.json#L18

"Cannot extract the tar archive: An archive entry has invalid header checksum",

@BlackAsLight
Copy link
Contributor

This might be the same bug that was fixed here #6064. Said fix just hasn't been released on JSR yet.

@tstachl
Copy link

tstachl commented Oct 11, 2024

I checked and #6064 is merged and part of v0.1.2 on jsr but the header checksum is still present.

deno 1.46.3 (stable, release, aarch64-apple-darwin)
v8 12.9.202.5-rusty
typescript 5.5.2

Steps to reproduce:

  1. Download a tarball from a github repo
  2. Change the filename in the script
  3. Run the script
import { UntarStream } from "@std/tar/untar-stream";
import { dirname, normalize } from "@std/path";

for await (
  const entry of (await Deno.open("./CHANGE_ME.tar.gz"))
    .readable
    .pipeThrough(new DecompressionStream("gzip"))
    .pipeThrough(new UntarStream())
) {
  const path = normalize(entry.path);
  await Deno.mkdir(dirname(path));
  await entry.readable?.pipeTo((await Deno.create(path)).writable);
}
$ deno run --allow-write --allow-read untar.ts
error: Uncaught (in promise) SyntaxError: Cannot extract the tar archive: An archive entry has invalid header checksum
        throw new SyntaxError(
              ^
    at UntarStream.#untar (https://jsr.io/@std/tar/0.1.2/untar_stream.ts:241:15)
    at eventLoopTick (ext:core/01_core.js:175:7)
    at async ReadableStreamDefaultController.<anonymous> (ext:deno_web/06_streams.js:5211:19)

@BlackAsLight
Copy link
Contributor

@tstachl are you sure you're using v0.1.2 because I tested it with the original archive mentioned and it worked fine for me.
Screenshot 2024-10-12 at 20 41 07

@tstachl
Copy link

tstachl commented Oct 12, 2024

@BlackAsLight yes, I'm using v0.1.2

{
  "imports": {
    "@std/path": "jsr:@std/path@^1.0.6",
    "@std/tar": "jsr:@std/tar@^0.1.2"
  }
}

And I just downloaded the original file and tested it with that as well:

import { UntarStream } from "@std/tar/untar-stream";
import { dirname, normalize } from "@std/path";

for await (
  const entry of (await Deno.open("./cndi-mac.tar.gz"))
    .readable
    .pipeThrough(new DecompressionStream("gzip"))
    .pipeThrough(new UntarStream())
) {
  const path = normalize(entry.path);
  await Deno.mkdir(dirname(path));
  await entry.readable?.pipeTo((await Deno.create(path)).writable);
}
deno run --allow-write --allow-read tasks/download.ts
error: Uncaught (in promise) SyntaxError: Cannot extract the tar archive: An archive entry has invalid header checksum
        throw new SyntaxError(
              ^
    at UntarStream.#untar (https://jsr.io/@std/tar/0.1.2/untar_stream.ts:241:15)
    at eventLoopTick (ext:core/01_core.js:175:7)
    at async ReadableStreamDefaultController.<anonymous> (ext:deno_web/06_streams.js:5211:19)

@tstachl
Copy link

tstachl commented Oct 12, 2024

Let me know if there is any additional debugging that I can provide.

@BlackAsLight
Copy link
Contributor

I'm not sure why you'd have a different outcome compared to me for the same code, or how to proceed from here

@tstachl
Copy link

tstachl commented Oct 13, 2024

@BlackAsLight it works now for the archive mentioned in the original post, I must have had the older version cached, but it still doesn't work for this one:

https://github.com/themesberg/flowbite-icons/archive/refs/tags/v1.3.0.tar.gz

Any chance you can try it with this one and let me know if that works for you?

@BlackAsLight
Copy link
Contributor

@BlackAsLight it works now for the archive mentioned in the original post, I must have had the older version cached, but it still doesn't work for this one:

https://github.com/themesberg/flowbite-icons/archive/refs/tags/v1.3.0.tar.gz

Any chance you can try it with this one and let me know if that works for you?

This tar file seems to be encoded with the "Pax Interchange Format" and not the "POSIX ustar Format". @std/tar only supports posix and old style tar formats, and not the pax version. Which is why it fails to decode it.

@tstachl
Copy link

tstachl commented Oct 14, 2024

Good to know, thank you for checking!

@lucacasonato lucacasonato changed the title @std/tar: untar_stream fails to extract a macos archive with "archive entry has invalid header checksum" @std/tar: untar_stream fails to extract PAX archives Nov 19, 2024
@lucacasonato lucacasonato added the bug Something isn't working label Nov 19, 2024
@kt3k
Copy link
Member

kt3k commented Nov 19, 2024

@BlackAsLight do you know whether std/archive did support PAX format or not?

@BlackAsLight
Copy link
Contributor

According to its documentation it only supported the ustar format, but did allow the pax format to be read without supporting its additional features.

@BlackAsLight
Copy link
Contributor

Looking a bit at the pax format, and the encoding of https://github.com/themesberg/flowbite-icons/archive/refs/tags/v1.3.0.tar.gz, this tarball doesn't seem to comply with the pax format. I am unable to figure out how this tarball in question is calculating their checksum. For instance, this tarball has the typeflag g which is part of the pax format, but uses the ustar magic number and version instead of the pax one.

This is the first header of the file. The checksum calculation based off the ustar format should end up being the unsigned integer 6477. The magic number for a pax file should be ustar (a space instead of a null), and the version should be \0 (a space followed by a null).

{
  name: "pax_global_header",
  mode: 66,
  uid: 0,
  gid: 0,
  size: 52,
  mtime: 1728636111,
  typeflag: "g",
  linkname: "",
  rawChecksumRange: Uint8Array(8) [
    48, 48, 49, 52,
    53, 49, 53,  0
  ],
  magic: "ustar\x00",
  version: "00",
  uname: "root",
  gname: "root",
  devmajor: "0000000",
  devminor: "0000000",
  prefix: ""
}

Skipping the checksum check entirely and continuing with the untarring process does seem to pull out all the files and has the pax's metadata also end up as additional files, and that is somewhat expected behaviour according to the spec

In particular, older implementations that do not fully support these extensions will extract the metadata into regular files, where the metadata can be examined as necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants