Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SkipBlockAlignmentPadding must be executed for all entries #74396

Merged
merged 1 commit into from
Aug 23, 2022

Conversation

jozkee
Copy link
Member

@jozkee jozkee commented Aug 23, 2022

Fixes #74242

For entries that were not _size % 512 == 0 we were not reading the block padding i.e: the zeroes after the entries' content was over and before the 512 block was completed. That was causing that the next entry was read with leading 0x0s.
FWIW: this only affected unseekable streams such as GZipStream.

@jozkee jozkee added this to the 7.0.0 milestone Aug 23, 2022
@jozkee jozkee self-assigned this Aug 23, 2022
@ghost
Copy link

ghost commented Aug 23, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #74242

For entries that were not _size % 512 == 0 we were not reading the block padding i.e: the zeroes after the entries' content was over and before the 512 block was completed. That was causing that the next entry was read with leading 0x0s.

Author: Jozkee
Assignees: Jozkee
Labels:

area-System.IO

Milestone: 7.0.0

@ghost
Copy link

ghost commented Aug 23, 2022

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #74242

For entries that were not _size % 512 == 0 we were not reading the block padding i.e: the zeroes after the entries' content was over and before the 512 block was completed. That was causing that the next entry was read with leading 0x0s.
FWIW: this only affected unseekable streams such as GZipStream.

Author: Jozkee
Assignees: Jozkee
Labels:

area-System.IO.Compression

Milestone: 7.0.0

Copy link
Member

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as it is. I left some low pri comments. No need to restart the CI for them.
Thanks for submitting, @jozkee !

Comment on lines +288 to +289
e = reader.GetNextEntry();
Assert.Null(e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would usually put these together. But this is fine too. No need to restart the CI for this.

Assert.Null(reader.GetNextEntry());

And the async variant:

Assert.Null(await reader.GetNextEntryAsync());

public void BlockAlignmentPadding_DoesNotAffectNextEntries(int contentSize)
{
byte[] fileContents = new byte[contentSize];
Array.Fill<byte>(fileContents, 0x1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. A 0x1 char is not a valid char in any of the header fields, so if the archive is not reading entries from the correct position, this char would cause an exception when collecting the header fields.

}

archive.Position = 0;
using var unseekable = new WrappedStream(archive, archive.CanRead, archive.CanWrite, canSeek: false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total nit. I would've used the arguments (archive, canRead: true, canWrite: true, canSeek: false). No need to restart the CI for this.

Assert.Equal(contentSize, e.Length);

byte[] buffer = new byte[contentSize];
while (e.DataStream.Read(buffer) > 0) ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would've added an Assert.NotNull(e.DataStream) before the while, but this works too. The first while loop will throw if it's null.

using var reader = new TarReader(unseekable);

TarEntry e = reader.GetNextEntry();
Assert.Equal(contentSize, e.Length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would've added a couple other asserts when checking both entries:

Assert.Equal("file", e.Name);
Assert.Equal(TarEntryType.RegularFile, e.EntryType);

The name field starts at byte 0 of the header. But I guess GetNextEntry() would throw if we aren't reading from the right position when it encounters an unexpected character.

@carlossanlop carlossanlop self-requested a review August 23, 2022 05:42
Copy link
Member

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a bug when calling GetNextEnrty(copyData: true). I'm investigating.

Copy link
Member

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I investigated and no, this change is not introducing a bug when using copyData: true. So this is good to merge.

I thought there was a bug because I cherry-picked this fix into my local machine, and one of my new tests, which uses copyData: true, was failing when I was attempting to read the copied DataStream. But I discovered that I just had to rewind the stream to position 0, because we return it with the pointer positioned at the end of the stream.

It was unexpected and I suspect it can be confusing to the user, so I plan on fixing it in my next PR. I will include a test to verify that the position of a copied DataStream is rewinded when the user accesses it for the first time.

@carlossanlop carlossanlop merged commit 275c1e5 into dotnet:main Aug 23, 2022
@carlossanlop carlossanlop deleted the tar_alignment_padding branch August 23, 2022 06:39
carlossanlop pushed a commit to carlossanlop/runtime that referenced this pull request Aug 23, 2022
carlossanlop pushed a commit to carlossanlop/runtime that referenced this pull request Aug 23, 2022
carlossanlop added a commit that referenced this pull request Aug 24, 2022
* Improve performance of Tar library (#74281)

* Avoid unnecessary byte[] allocations

* Remove unnecessary use of FileStreamOptions

* Clean up Dispose{Async} implementations

* Clean up unnecessary consts

Not a perf thing, just readability.

* Remove MemoryStream/Encoding.UTF8.GetBytes allocations, unnecessary async variants, and overhaul GenerateExtendedAttributesDataStream

* Avoid string allocations in ReadMagicAttribute

* Avoid allocation in WriteAsOctal

* Improve handling of octal

* Avoid allocation for version string

* Removing boxing and char string allocation in GenerateExtendedAttributeName

* Fix a couple unnecessary dictionary lookups

* Replace Enum.HasFlag usage

* Remove allocations from Write{Posix}Name

* Replace ArrayPool use with string.Create

* Replace more superfluous ArrayPool usage

* Remove ArrayPool use from System.IO.Compression.ZipFile

* Fix inverted condition

* Use generic math to parse octal

* Remove allocations from StringReader and string.Split

* Remove magic string allocation for Ustar when not V7

* Remove file name and directory name allocation in GenerateExtendedAttributeName

* fix tar strings (#74321)

* Fix some corner cases in TarReader (#74329)

* Fix a few Tar issues post perf improvements (#74338)

* Fix a few Tar issues post perf improvements

* Update src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Write.cs

* Skip directory symlink recursion on TarFile archive creation (#74376)

* Skip directory symlink recursion on TarFile archive creation

* Add symlink verification

* Address suggestions by danmoseley

Co-authored-by: carlossanlop <[email protected]>

* SkipBlockAlignmentPadding must be executed for all entries (#74396)

* Set modified timestamps on files being extracted from tar archives (#74400)

* Add tests for exotic external tar asset archives, fix some more corner case bugs (#74412)

* Remove unused _readFirstEntry. Remnant from before we created PaxGlobalExtendedAttributesEntry.

* Set the position of the freshly copied data stream to 0, so the first user access of the DataStream property gives them a stream ready to read from the beginning.

* Process a PAX actual entry's data block only after the extended attributes are analyzed, in case the size is found as an extended attribute and needs to be overriden.

* Add tests to verify the entries of the new external tar assets can be read. Verify their DataStream if available.

* Add copyData argument to recent alignment padding tests.

* Throw an exception sooner and with a clearer message when a data section is unexpected for the entry type.

* Allow trailing nulls and spaces in octal fields.
Co-authored-by: @am11 Adeel Mujahid <[email protected]>

* Throw a clearer exception if the unsupported sparse file entry type is encountered. These entries have additional data that indicates the locations of sparse bytes, which cannot be read with just the size field. So to avoid accidentally offseting the reader, we throw.

* Tests.

* Rename to TrimLeadingNullsAndSpaces

Co-authored-by: carlossanlop <[email protected]>

* Remove Compression changes, keep changes confined to Tar.

* Fix build failure due to missing using in TarHelpers.Windows

Co-authored-by: Stephen Toub <[email protected]>
Co-authored-by: Dan Moseley <[email protected]>
Co-authored-by: Adeel Mujahid <[email protected]>
Co-authored-by: carlossanlop <[email protected]>
Co-authored-by: David Cantú <[email protected]>
@ghost ghost locked as resolved and limited conversation to collaborators Sep 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TarFile.ExtractToDirectory() doesn't work sometimes with compressed archives
2 participants