Skip to content

Clean up for async creation#1132

Merged
adamhathcock merged 116 commits intomasterfrom
adam/async-creation
Jan 27, 2026
Merged

Clean up for async creation#1132
adamhathcock merged 116 commits intomasterfrom
adam/async-creation

Conversation

@adamhathcock
Copy link
Owner

For some formats, async should be all the way. 7Zip needs fixing

This pull request introduces asynchronous detection methods for multiple archive and compression formats, allowing non-blocking checks to determine file types. It also refactors and simplifies some synchronous detection logic for improved performance and consistency. The most significant changes are grouped below:

Asynchronous Format Detection:

  • Added IsArchiveAsync or similar async methods for ACE (AceHeader.cs), ARJ (ArjHeader.cs), BZip2 (BZip2Stream.cs), LZip (LZipStream.cs), LZW (LzwStream.cs), ZStandard (ZStandardStream.cs), RAR (RarArchive.Factory.cs), SevenZip (SevenZipArchive.Factory.cs), TAR (TarArchive.Factory.cs), and XZ (XZStream.cs) formats, enabling asynchronous file type checks with CancellationToken support. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

  • Updated the ACE and ARJ factories to implement and expose these new asynchronous detection methods. [1] [2]

Synchronous Detection Refactoring & Performance:

  • Refactored synchronous detection logic to use pooled buffers (via ArrayPool<byte>.Shared) for ARC and SevenZip formats, reducing allocations and improving performance. [1] [2]

API Consistency and Interface Changes:

  • Removed the CancellationToken parameter from the IArchiveFactory.OpenAsyncArchive method and its implementations to simplify the API, as the token was not being used in actual asynchronous operations. [1] [2] [3]

General Codebase Improvements:

  • Added missing using directives for System.Threading and System.Threading.Tasks where needed to support the new asynchronous methods. [1] [2] [3] [4] [5] [6] [7]

These changes collectively modernize the archive/compression format detection APIs and improve their efficiency and usability.

@adamhathcock
Copy link
Owner Author

@copilot fix the SevenZipArchive_LZMA_AsyncStreamExtraction test

Copy link
Contributor

Copilot AI commented Jan 15, 2026

@adamhathcock I've opened a new pull request, #1133, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 8 commits January 15, 2026 16:53
Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
…adDatabase methods

Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Add async I/O support for SevenZip archive initialization
Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
@adamhathcock adamhathcock marked this pull request as ready for review January 16, 2026 09:17
Copilot AI review requested due to automatic review settings January 16, 2026 09:17
@adamhathcock
Copy link
Owner Author

Upgrading to xunit v3 that properly supports ValueTask as revealed that not all the tests were running correctly.

…kable streams)

Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces asynchronous detection methods for multiple archive and compression formats to enable non-blocking file type checks. The changes also refactor synchronous detection logic to use pooled buffers for improved performance and remove unnecessary cancellation token parameters from the IArchiveFactory interface.

Changes:

  • Added async IsArchiveAsync methods for ACE, ARJ, BZip2, LZip, LZW, ZStandard, RAR, SevenZip, TAR, and XZ formats with proper cancellation token support
  • Refactored ARC and SevenZip format detection to use ArrayPool<byte>.Shared for buffer management, reducing allocations
  • Removed CancellationToken parameter from IArchiveFactory.OpenAsyncArchive interface methods to simplify the API

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tests/SharpCompress.Test/SevenZip/SevenZipArchiveAsyncTests.cs Updated test methods to return Task instead of ValueTask, enabled tests for .NET Framework with conditional compilation, and commented out one test
tests/SharpCompress.Test/Mocks/AsyncOnlyStream.cs Changed base class from Stream to SharpCompressStream
tests/SharpCompress.Test/ArchiveTests.cs Removed two test methods for async archive opening scenarios
src/SharpCompress/packages.lock.json Updated Microsoft.NET.ILLink.Tasks from 10.0.0 to 10.0.1
src/SharpCompress/Factories/ZipFactory.cs Removed duplicate IsArchiveAsync method
src/SharpCompress/Factories/ZStandardFactory.cs Added CancellationToken parameter to IsArchiveAsync method signature
src/SharpCompress/Factories/TarFactory.cs Updated IsArchiveAsync to call new TarArchive.IsTarFileAsync method and removed CancellationToken from OpenAsyncArchive
src/SharpCompress/Factories/SevenZipFactory.cs Added async detection implementation and updated OpenAsyncArchive methods to remove CancellationToken
src/SharpCompress/Factories/RarFactory.cs Added async detection method and removed CancellationToken from OpenAsyncArchive
src/SharpCompress/Factories/GZipFactory.cs Removed CancellationToken parameter from OpenAsyncArchive method
src/SharpCompress/Factories/Factory.cs Made IsArchiveAsync abstract with CancellationToken parameter and removed unused TryOpenReaderAsync method
src/SharpCompress/Factories/ArjFactory.cs Added async detection method via ArjHeader.IsArchiveAsync
src/SharpCompress/Factories/ArcFactory.cs Implemented async detection using ArrayPool<byte>.Shared for buffer management
src/SharpCompress/Factories/AceFactory.cs Added async detection method via AceHeader.IsArchiveAsync
src/SharpCompress/Compressors/ZStandard/ZStandardStream.cs Added IsZStandardAsync method for async format detection
src/SharpCompress/Compressors/Xz/XZStream.cs Added IsXZStreamAsync method for async format detection
src/SharpCompress/Compressors/Lzw/LzwStream.cs Added IsLzwStreamAsync method for async format detection
src/SharpCompress/Compressors/LZMA/LZipStream.cs Added IsLZipFileAsync and ValidateAndReadSizeAsync methods for async detection
src/SharpCompress/Compressors/BZip2/BZip2Stream.cs Added IsBZip2Async method for async format detection
src/SharpCompress/Common/SevenZip/ArchiveReader.cs Added OpenAsync and ReadDatabaseAsync methods for async archive reading
src/SharpCompress/Common/Arj/Headers/ArjHeader.cs Added IsArchiveAsync method for async detection
src/SharpCompress/Common/Ace/Headers/AceHeader.cs Added IsArchiveAsync method for async detection
src/SharpCompress/Archives/Tar/TarArchive.Factory.cs Added IsTarFileAsync method (currently synchronous implementation)
src/SharpCompress/Archives/SevenZip/SevenZipArchive.cs Refactored LoadEntries and added LoadEntriesAsync with LoadFactoryAsync for async archive processing
src/SharpCompress/Archives/SevenZip/SevenZipArchive.Factory.cs Added IsSevenZipFileAsync, refactored signature matching to use ArrayPool<byte>.Shared, and added OpenAsyncArchive overloads
src/SharpCompress/Archives/Rar/RarArchive.Factory.cs Added IsRarFileAsync method (currently synchronous implementation)
src/SharpCompress/Archives/IArchiveFactory.cs Removed CancellationToken parameter from OpenAsyncArchive(FileInfo) method
src/SharpCompress/Archives/AutoArchiveFactory.cs Updated OpenAsyncArchive to match interface changes
src/SharpCompress/Archives/ArchiveFactory.cs Updated calls to remove CancellationToken parameter when calling factory methods

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 168 to 187
public static async ValueTask<bool> IsTarFileAsync(
Stream stream,
CancellationToken cancellationToken = default
)
{
cancellationToken.ThrowIfCancellationRequested();
try
{
var tarHeader = new TarHeader(new ArchiveEncoding());
var reader = new BinaryReader(stream);
var readSucceeded = tarHeader.Read(reader);
var isEmptyArchive =
tarHeader.Name?.Length == 0
&& tarHeader.Size == 0
&& Enum.IsDefined(typeof(EntryType), tarHeader.EntryType);
return readSucceeded || isEmptyArchive;
}
catch { }
return false;
}
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IsTarFileAsync method does not actually perform any asynchronous operations. It synchronously reads from the stream using the synchronous Read method of BinaryReader. This defeats the purpose of having an async method and could block the calling thread. The method should be refactored to use ReadAsync or similar asynchronous I/O operations to properly support async file detection.

Copilot uses AI. Check for mistakes.
Comment on lines 168 to 184
public static async ValueTask<bool> IsRarFileAsync(
Stream stream,
ReaderOptions? options = null,
CancellationToken cancellationToken = default
)
{
cancellationToken.ThrowIfCancellationRequested();
try
{
MarkHeader.Read(stream, true, false);
return true;
}
catch
{
return false;
}
}
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IsRarFileAsync method does not actually perform any asynchronous operations. It synchronously reads from the stream using the synchronous Read method within MarkHeader.Read. This defeats the purpose of having an async method and could block the calling thread. The method should be refactored to use ReadAsync or implement a proper async version of MarkHeader.Read to support true asynchronous file detection.

Copilot uses AI. Check for mistakes.
@@ -47,9 +47,5 @@ public interface IArchiveFactory : IFactory
/// <param name="fileInfo">the file to open.</param>
/// <param name="readerOptions">reading options.</param>
/// <param name="cancellationToken">Cancellation token.</param>
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XML documentation comment includes a param tag for cancellationToken on line 49, but the actual method signature on line 50 no longer has this parameter. This will cause a documentation warning. Remove the cancellationToken documentation parameter.

Suggested change
/// <param name="cancellationToken">Cancellation token.</param>

Copilot uses AI. Check for mistakes.
var buffer = ArrayPool<byte>.Shared.Rent(6);
try
{
if (!await stream.ReadFullyAsync(buffer, cancellationToken).ConfigureAwait(false))
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ReadFullyAsync call attempts to read buffer.Length bytes from the stream, but the buffer is rented from ArrayPool<byte>.Shared.Rent(6), which may return a buffer larger than 6 bytes. This could cause the method to try reading more bytes than intended and potentially block or fail. Use the overload ReadFullyAsync(buffer, 0, 6, cancellationToken) to explicitly specify reading exactly 6 bytes.

Suggested change
if (!await stream.ReadFullyAsync(buffer, cancellationToken).ConfigureAwait(false))
if (!await stream.ReadFullyAsync(buffer, 0, 6, cancellationToken).ConfigureAwait(false))

Copilot uses AI. Check for mistakes.
Comment on lines 11 to 17
private readonly Stream _stream;

public AsyncOnlyStream(Stream stream)
: base(stream)
{
_stream = stream;
// Console.WriteLine("AsyncOnlyStream created");
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AsyncOnlyStream now stores the same stream in both the _stream field and the base SharpCompressStream class through the base(stream) call. This creates redundancy and could lead to maintenance issues. Consider removing the _stream field and using the inherited Stream property from SharpCompressStream instead. All references to _stream in this class should be changed to use the base class property.

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

&& Enum.IsDefined(typeof(EntryType), tarHeader.EntryType);
return readSucceeded || isEmptyArchive;
}
catch { }
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poor error handling: empty catch block.

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Copilot AI and others added 2 commits January 16, 2026 09:26
Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Copy link
Contributor

Copilot AI commented Jan 16, 2026

@adamhathcock I've opened a new pull request, #1138, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings January 16, 2026 09:29
Copilot AI review requested due to automatic review settings January 26, 2026 15:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 213 out of 338 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/SharpCompress/Archives/GZip/GZipArchive.Factory.cs:192

  • The result of ReadFullyAsync(...) is no longer checked. If the stream ends early, the buffer may contain uninitialized/previous data and the signature check becomes unreliable. Restore the previous behavior by checking the boolean return (or equivalent) and returning false when fewer than 10 bytes are read.
    {
        var header = ArrayPool<byte>.Shared.Rent(10);
        try
        {
            await stream.ReadFullyAsync(header, 0, 10, cancellationToken).ConfigureAwait(false);

            if (header[0] != 0x1F || header[1] != 0x8B || header[2] != 8)
            {
                return false;
            }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +39 to +45
private IAsyncEnumerator<RarFilePart> filePartEnumerator;
private Stream? currentStream;

private MultiVolumeReadOnlyAsyncStream(IAsyncEnumerable<RarFilePart> parts)
{
filePartEnumerator = parts.GetAsyncEnumerator();
}
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IAsyncEnumerator<RarFilePart> should be disposed when the stream is disposed (it is typically IAsyncDisposable), and currentStream should also be disposed/closed as appropriate. Add an override of Dispose(bool) (and DisposeAsync if applicable) that disposes currentStream and disposes filePartEnumerator to avoid leaking file handles/resources.

Copilot uses AI. Check for mistakes.
Comment on lines +49 to +56
private void InitializeNextFilePart()
{
maxPosition = filePartEnumerator.Current.FileHeader.CompressedSize;
currentPosition = 0;
currentStream = filePartEnumerator.Current.GetCompressedStream();

CurrentCrc = filePartEnumerator.Current.FileHeader.FileCrc;
}
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IAsyncEnumerator<RarFilePart> should be disposed when the stream is disposed (it is typically IAsyncDisposable), and currentStream should also be disposed/closed as appropriate. Add an override of Dispose(bool) (and DisposeAsync if applicable) that disposes currentStream and disposes filePartEnumerator to avoid leaking file handles/resources.

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +34
public void AddAllFromDirectory(
string filePath,
string searchPattern = "*.*",
SearchOption searchOption = SearchOption.AllDirectories
)
{
using (writableArchive.PauseEntryRebuilding())
{
foreach (
var path in Directory.EnumerateFiles(filePath, searchPattern, searchOption)
)
{
var fileInfo = new FileInfo(path);
writableArchive.AddEntry(
path.Substring(filePath.Length),
fileInfo.OpenRead(),
true,
fileInfo.Length,
fileInfo.LastWriteTime
);
}
}
}
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues here: (1) path.Substring(filePath.Length) is brittle (can produce incorrect keys if filePath doesn’t end with a separator, and can yield leading separators). Prefer Path.GetRelativePath(filePath, path) and normalize separators. (2) fileInfo.OpenRead() is passed in-line; if AddEntry(...) throws, the opened stream will leak. Consider opening the stream in a try/finally (or using) and passing closeStream: false when you own disposal, or otherwise ensure the stream is disposed on failure.

Copilot uses AI. Check for mistakes.
Comment on lines +14 to +36
public async ValueTask AddAllFromDirectoryAsync(
string filePath,
string searchPattern = "*.*",
SearchOption searchOption = SearchOption.AllDirectories
)
{
using (writableArchive.PauseEntryRebuilding())
{
foreach (
var path in Directory.EnumerateFiles(filePath, searchPattern, searchOption)
)
{
var fileInfo = new FileInfo(path);
await writableArchive.AddEntryAsync(
path.Substring(filePath.Length),
fileInfo.OpenRead(),
true,
fileInfo.Length,
fileInfo.LastWriteTime
);
}
}
}
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concerns as the sync variant: Substring(filePath.Length) is fragile (prefer Path.GetRelativePath) and OpenRead() can leak if AddEntryAsync(...) throws before the archive takes ownership. Open the stream in a using/try-finally and pass closeStream: false (or otherwise guarantee disposal on failure).

Copilot uses AI. Check for mistakes.
@adamhathcock
Copy link
Owner Author

This has a lot of changes. Getting this into master now then will work on optional buffer sizing so that tests will ensure that asynchronous paths are used instead of buffered sync paths

@adamhathcock adamhathcock merged commit f64fa53 into master Jan 27, 2026
5 checks passed
@adamhathcock adamhathcock deleted the adam/async-creation branch January 27, 2026 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments