Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File content CRC32 values are not written to the index #45

Closed
creachadair opened this issue Jun 12, 2019 · 2 comments · Fixed by #46
Closed

File content CRC32 values are not written to the index #45

creachadair opened this issue Jun 12, 2019 · 2 comments · Fixed by #46
Labels

Comments

@creachadair
Copy link
Contributor

creachadair commented Jun 12, 2019

Repro:

echo "foo" > test.txt
siva pack ts test.txt
xxd ts

Expected: Checksum is 7e3265a8(hex).
Observed: Checksum is 0.

Although the repro only uses one file, the problem affects archives with multiple files too. The basic problem is a double-flush. What siva pack does for each file is basically:

  1. Call WriteHeader, adding a new IndexEntry for the next file.
  2. Write the file contents.
  3. Call Flush.

Step (3) populates the CRC32 for the index entry added in (1) correctly, and then resets the CRC context. However, when the next file is added, the first thing WriteHeader does is to flushIfPending. Since the previous call to Flush did not remove the old IndexEntry, this second flush succeeds—but since the CRC context was reset the value is now zero.

This cascades through the archive as more files are added. Even if the archive contains only one file, the Close operation also performs a Flush, so it has the same effect.

@jfontan
Copy link
Contributor

jfontan commented Jun 14, 2019

@creachadair Is this change backwards compatible? As I understand Entries CRC is not checked.

@creachadair
Copy link
Contributor Author

creachadair commented Jun 14, 2019

@creachadair Is this change backwards compatible? As I understand Entries CRC is not checked.

I think so. The spec says the field should be populated but it's possible someone has written software to assume that field is empty.

The library populates the field, and the existing unit tests for the library check that it is correct. The command-line tool drove this corner case by explicitly calling Flush (which is marked as "optional" but prior to #46 was "unsafe"). So unless someone wrote a tool that reads the output from the CLI and checks that the per-file CRCs are 0 (which would be weird, and I think therefore unlikely), it should be safe.

I don't think the command-line tool verifies the checksums.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants