Conversation
# Conflicts: # tests/SharpCompress.Test/SevenZip/SevenZipArchiveAsyncTests.cs
|
@copilot fix the SevenZipArchive_LZMA_AsyncStreamExtraction test |
|
@adamhathcock I've opened a new pull request, #1133, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
…adDatabase methods Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Add async I/O support for SevenZip archive initialization
Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
|
Upgrading to xunit v3 that properly supports ValueTask as revealed that not all the tests were running correctly. |
…kable streams) Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This pull request introduces asynchronous detection methods for multiple archive and compression formats to enable non-blocking file type checks. The changes also refactor synchronous detection logic to use pooled buffers for improved performance and remove unnecessary cancellation token parameters from the IArchiveFactory interface.
Changes:
- Added async
IsArchiveAsyncmethods for ACE, ARJ, BZip2, LZip, LZW, ZStandard, RAR, SevenZip, TAR, and XZ formats with proper cancellation token support - Refactored ARC and SevenZip format detection to use
ArrayPool<byte>.Sharedfor buffer management, reducing allocations - Removed
CancellationTokenparameter fromIArchiveFactory.OpenAsyncArchiveinterface methods to simplify the API
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/SharpCompress.Test/SevenZip/SevenZipArchiveAsyncTests.cs | Updated test methods to return Task instead of ValueTask, enabled tests for .NET Framework with conditional compilation, and commented out one test |
| tests/SharpCompress.Test/Mocks/AsyncOnlyStream.cs | Changed base class from Stream to SharpCompressStream |
| tests/SharpCompress.Test/ArchiveTests.cs | Removed two test methods for async archive opening scenarios |
| src/SharpCompress/packages.lock.json | Updated Microsoft.NET.ILLink.Tasks from 10.0.0 to 10.0.1 |
| src/SharpCompress/Factories/ZipFactory.cs | Removed duplicate IsArchiveAsync method |
| src/SharpCompress/Factories/ZStandardFactory.cs | Added CancellationToken parameter to IsArchiveAsync method signature |
| src/SharpCompress/Factories/TarFactory.cs | Updated IsArchiveAsync to call new TarArchive.IsTarFileAsync method and removed CancellationToken from OpenAsyncArchive |
| src/SharpCompress/Factories/SevenZipFactory.cs | Added async detection implementation and updated OpenAsyncArchive methods to remove CancellationToken |
| src/SharpCompress/Factories/RarFactory.cs | Added async detection method and removed CancellationToken from OpenAsyncArchive |
| src/SharpCompress/Factories/GZipFactory.cs | Removed CancellationToken parameter from OpenAsyncArchive method |
| src/SharpCompress/Factories/Factory.cs | Made IsArchiveAsync abstract with CancellationToken parameter and removed unused TryOpenReaderAsync method |
| src/SharpCompress/Factories/ArjFactory.cs | Added async detection method via ArjHeader.IsArchiveAsync |
| src/SharpCompress/Factories/ArcFactory.cs | Implemented async detection using ArrayPool<byte>.Shared for buffer management |
| src/SharpCompress/Factories/AceFactory.cs | Added async detection method via AceHeader.IsArchiveAsync |
| src/SharpCompress/Compressors/ZStandard/ZStandardStream.cs | Added IsZStandardAsync method for async format detection |
| src/SharpCompress/Compressors/Xz/XZStream.cs | Added IsXZStreamAsync method for async format detection |
| src/SharpCompress/Compressors/Lzw/LzwStream.cs | Added IsLzwStreamAsync method for async format detection |
| src/SharpCompress/Compressors/LZMA/LZipStream.cs | Added IsLZipFileAsync and ValidateAndReadSizeAsync methods for async detection |
| src/SharpCompress/Compressors/BZip2/BZip2Stream.cs | Added IsBZip2Async method for async format detection |
| src/SharpCompress/Common/SevenZip/ArchiveReader.cs | Added OpenAsync and ReadDatabaseAsync methods for async archive reading |
| src/SharpCompress/Common/Arj/Headers/ArjHeader.cs | Added IsArchiveAsync method for async detection |
| src/SharpCompress/Common/Ace/Headers/AceHeader.cs | Added IsArchiveAsync method for async detection |
| src/SharpCompress/Archives/Tar/TarArchive.Factory.cs | Added IsTarFileAsync method (currently synchronous implementation) |
| src/SharpCompress/Archives/SevenZip/SevenZipArchive.cs | Refactored LoadEntries and added LoadEntriesAsync with LoadFactoryAsync for async archive processing |
| src/SharpCompress/Archives/SevenZip/SevenZipArchive.Factory.cs | Added IsSevenZipFileAsync, refactored signature matching to use ArrayPool<byte>.Shared, and added OpenAsyncArchive overloads |
| src/SharpCompress/Archives/Rar/RarArchive.Factory.cs | Added IsRarFileAsync method (currently synchronous implementation) |
| src/SharpCompress/Archives/IArchiveFactory.cs | Removed CancellationToken parameter from OpenAsyncArchive(FileInfo) method |
| src/SharpCompress/Archives/AutoArchiveFactory.cs | Updated OpenAsyncArchive to match interface changes |
| src/SharpCompress/Archives/ArchiveFactory.cs | Updated calls to remove CancellationToken parameter when calling factory methods |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public static async ValueTask<bool> IsTarFileAsync( | ||
| Stream stream, | ||
| CancellationToken cancellationToken = default | ||
| ) | ||
| { | ||
| cancellationToken.ThrowIfCancellationRequested(); | ||
| try | ||
| { | ||
| var tarHeader = new TarHeader(new ArchiveEncoding()); | ||
| var reader = new BinaryReader(stream); | ||
| var readSucceeded = tarHeader.Read(reader); | ||
| var isEmptyArchive = | ||
| tarHeader.Name?.Length == 0 | ||
| && tarHeader.Size == 0 | ||
| && Enum.IsDefined(typeof(EntryType), tarHeader.EntryType); | ||
| return readSucceeded || isEmptyArchive; | ||
| } | ||
| catch { } | ||
| return false; | ||
| } |
There was a problem hiding this comment.
The IsTarFileAsync method does not actually perform any asynchronous operations. It synchronously reads from the stream using the synchronous Read method of BinaryReader. This defeats the purpose of having an async method and could block the calling thread. The method should be refactored to use ReadAsync or similar asynchronous I/O operations to properly support async file detection.
| public static async ValueTask<bool> IsRarFileAsync( | ||
| Stream stream, | ||
| ReaderOptions? options = null, | ||
| CancellationToken cancellationToken = default | ||
| ) | ||
| { | ||
| cancellationToken.ThrowIfCancellationRequested(); | ||
| try | ||
| { | ||
| MarkHeader.Read(stream, true, false); | ||
| return true; | ||
| } | ||
| catch | ||
| { | ||
| return false; | ||
| } | ||
| } |
There was a problem hiding this comment.
The IsRarFileAsync method does not actually perform any asynchronous operations. It synchronously reads from the stream using the synchronous Read method within MarkHeader.Read. This defeats the purpose of having an async method and could block the calling thread. The method should be refactored to use ReadAsync or implement a proper async version of MarkHeader.Read to support true asynchronous file detection.
| @@ -47,9 +47,5 @@ public interface IArchiveFactory : IFactory | |||
| /// <param name="fileInfo">the file to open.</param> | |||
| /// <param name="readerOptions">reading options.</param> | |||
| /// <param name="cancellationToken">Cancellation token.</param> | |||
There was a problem hiding this comment.
The XML documentation comment includes a param tag for cancellationToken on line 49, but the actual method signature on line 50 no longer has this parameter. This will cause a documentation warning. Remove the cancellationToken documentation parameter.
| /// <param name="cancellationToken">Cancellation token.</param> |
| var buffer = ArrayPool<byte>.Shared.Rent(6); | ||
| try | ||
| { | ||
| if (!await stream.ReadFullyAsync(buffer, cancellationToken).ConfigureAwait(false)) |
There was a problem hiding this comment.
The ReadFullyAsync call attempts to read buffer.Length bytes from the stream, but the buffer is rented from ArrayPool<byte>.Shared.Rent(6), which may return a buffer larger than 6 bytes. This could cause the method to try reading more bytes than intended and potentially block or fail. Use the overload ReadFullyAsync(buffer, 0, 6, cancellationToken) to explicitly specify reading exactly 6 bytes.
| if (!await stream.ReadFullyAsync(buffer, cancellationToken).ConfigureAwait(false)) | |
| if (!await stream.ReadFullyAsync(buffer, 0, 6, cancellationToken).ConfigureAwait(false)) |
| private readonly Stream _stream; | ||
|
|
||
| public AsyncOnlyStream(Stream stream) | ||
| : base(stream) | ||
| { | ||
| _stream = stream; | ||
| // Console.WriteLine("AsyncOnlyStream created"); |
There was a problem hiding this comment.
The AsyncOnlyStream now stores the same stream in both the _stream field and the base SharpCompressStream class through the base(stream) call. This creates redundancy and could lead to maintenance issues. Consider removing the _stream field and using the inherited Stream property from SharpCompressStream instead. All references to _stream in this class should be changed to use the base class property.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| && Enum.IsDefined(typeof(EntryType), tarHeader.EntryType); | ||
| return readSucceeded || isEmptyArchive; | ||
| } | ||
| catch { } |
There was a problem hiding this comment.
Poor error handling: empty catch block.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
|
@adamhathcock I've opened a new pull request, #1138, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
# Conflicts: # src/SharpCompress/Common/EntryStream.cs # src/SharpCompress/IO/BufferedSubStream.cs # src/SharpCompress/packages.lock.json
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 213 out of 338 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
src/SharpCompress/Archives/GZip/GZipArchive.Factory.cs:192
- The result of
ReadFullyAsync(...)is no longer checked. If the stream ends early, the buffer may contain uninitialized/previous data and the signature check becomes unreliable. Restore the previous behavior by checking the boolean return (or equivalent) and returningfalsewhen fewer than 10 bytes are read.
{
var header = ArrayPool<byte>.Shared.Rent(10);
try
{
await stream.ReadFullyAsync(header, 0, 10, cancellationToken).ConfigureAwait(false);
if (header[0] != 0x1F || header[1] != 0x8B || header[2] != 8)
{
return false;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| private IAsyncEnumerator<RarFilePart> filePartEnumerator; | ||
| private Stream? currentStream; | ||
|
|
||
| private MultiVolumeReadOnlyAsyncStream(IAsyncEnumerable<RarFilePart> parts) | ||
| { | ||
| filePartEnumerator = parts.GetAsyncEnumerator(); | ||
| } |
There was a problem hiding this comment.
IAsyncEnumerator<RarFilePart> should be disposed when the stream is disposed (it is typically IAsyncDisposable), and currentStream should also be disposed/closed as appropriate. Add an override of Dispose(bool) (and DisposeAsync if applicable) that disposes currentStream and disposes filePartEnumerator to avoid leaking file handles/resources.
| private void InitializeNextFilePart() | ||
| { | ||
| maxPosition = filePartEnumerator.Current.FileHeader.CompressedSize; | ||
| currentPosition = 0; | ||
| currentStream = filePartEnumerator.Current.GetCompressedStream(); | ||
|
|
||
| CurrentCrc = filePartEnumerator.Current.FileHeader.FileCrc; | ||
| } |
There was a problem hiding this comment.
IAsyncEnumerator<RarFilePart> should be disposed when the stream is disposed (it is typically IAsyncDisposable), and currentStream should also be disposed/closed as appropriate. Add an override of Dispose(bool) (and DisposeAsync if applicable) that disposes currentStream and disposes filePartEnumerator to avoid leaking file handles/resources.
| public void AddAllFromDirectory( | ||
| string filePath, | ||
| string searchPattern = "*.*", | ||
| SearchOption searchOption = SearchOption.AllDirectories | ||
| ) | ||
| { | ||
| using (writableArchive.PauseEntryRebuilding()) | ||
| { | ||
| foreach ( | ||
| var path in Directory.EnumerateFiles(filePath, searchPattern, searchOption) | ||
| ) | ||
| { | ||
| var fileInfo = new FileInfo(path); | ||
| writableArchive.AddEntry( | ||
| path.Substring(filePath.Length), | ||
| fileInfo.OpenRead(), | ||
| true, | ||
| fileInfo.Length, | ||
| fileInfo.LastWriteTime | ||
| ); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Two issues here: (1) path.Substring(filePath.Length) is brittle (can produce incorrect keys if filePath doesn’t end with a separator, and can yield leading separators). Prefer Path.GetRelativePath(filePath, path) and normalize separators. (2) fileInfo.OpenRead() is passed in-line; if AddEntry(...) throws, the opened stream will leak. Consider opening the stream in a try/finally (or using) and passing closeStream: false when you own disposal, or otherwise ensure the stream is disposed on failure.
| public async ValueTask AddAllFromDirectoryAsync( | ||
| string filePath, | ||
| string searchPattern = "*.*", | ||
| SearchOption searchOption = SearchOption.AllDirectories | ||
| ) | ||
| { | ||
| using (writableArchive.PauseEntryRebuilding()) | ||
| { | ||
| foreach ( | ||
| var path in Directory.EnumerateFiles(filePath, searchPattern, searchOption) | ||
| ) | ||
| { | ||
| var fileInfo = new FileInfo(path); | ||
| await writableArchive.AddEntryAsync( | ||
| path.Substring(filePath.Length), | ||
| fileInfo.OpenRead(), | ||
| true, | ||
| fileInfo.Length, | ||
| fileInfo.LastWriteTime | ||
| ); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Same concerns as the sync variant: Substring(filePath.Length) is fragile (prefer Path.GetRelativePath) and OpenRead() can leak if AddEntryAsync(...) throws before the archive takes ownership. Open the stream in a using/try-finally and pass closeStream: false (or otherwise guarantee disposal on failure).
|
This has a lot of changes. Getting this into master now then will work on optional buffer sizing so that tests will ensure that asynchronous paths are used instead of buffered sync paths |
For some formats, async should be all the way. 7Zip needs fixing
This pull request introduces asynchronous detection methods for multiple archive and compression formats, allowing non-blocking checks to determine file types. It also refactors and simplifies some synchronous detection logic for improved performance and consistency. The most significant changes are grouped below:
Asynchronous Format Detection:
Added
IsArchiveAsyncor similar async methods for ACE (AceHeader.cs), ARJ (ArjHeader.cs), BZip2 (BZip2Stream.cs), LZip (LZipStream.cs), LZW (LzwStream.cs), ZStandard (ZStandardStream.cs), RAR (RarArchive.Factory.cs), SevenZip (SevenZipArchive.Factory.cs), TAR (TarArchive.Factory.cs), and XZ (XZStream.cs) formats, enabling asynchronous file type checks withCancellationTokensupport. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]Updated the ACE and ARJ factories to implement and expose these new asynchronous detection methods. [1] [2]
Synchronous Detection Refactoring & Performance:
ArrayPool<byte>.Shared) for ARC and SevenZip formats, reducing allocations and improving performance. [1] [2]API Consistency and Interface Changes:
CancellationTokenparameter from theIArchiveFactory.OpenAsyncArchivemethod and its implementations to simplify the API, as the token was not being used in actual asynchronous operations. [1] [2] [3]General Codebase Improvements:
usingdirectives forSystem.ThreadingandSystem.Threading.Taskswhere needed to support the new asynchronous methods. [1] [2] [3] [4] [5] [6] [7]These changes collectively modernize the archive/compression format detection APIs and improve their efficiency and usability.