Skip to content

Fix async decompression of .7z files by implementing Memory<byte> ReadAsync overload#1114

Merged
adamhathcock merged 6 commits intomasterfrom
copilot/fix-7z-file-decompression-error
Jan 7, 2026
Merged

Fix async decompression of .7z files by implementing Memory<byte> ReadAsync overload#1114
adamhathcock merged 6 commits intomasterfrom
copilot/fix-7z-file-decompression-error

Conversation

Copy link
Contributor

Copilot AI commented Jan 6, 2026

Async extraction of .7z files with LZMA/LZMA2 compression threw DataErrorException when using CopyToAsync(), while synchronous CopyTo() worked correctly.

Root Cause

In .NET 6+, ReadExactlyAsync calls ReadAsync(Memory<byte>, CancellationToken). BufferedSubStream only implemented the legacy byte[] overload, causing the base Stream class to fall back to synchronous reads. This corrupted cache state when LZMA's RangeCoder mixed sync ReadByte() calls with async operations.

Changes

  • BufferedSubStream: Added ReadAsync(Memory<byte>, CancellationToken) and RefillCacheAsync() for true async I/O
  • Tests: Added async test coverage for LZMA, LZMA2, Solid, BZip2, and PPMd archives

Example

// This now works correctly with async operations
using var archive = ArchiveFactory.Open(archivePath);
foreach (var entry in archive.Entries.Where(e => !e.IsDirectory))
{
    using var stream = await entry.OpenEntryStreamAsync(cancellationToken);
    await stream.CopyToAsync(outputStream, cancellationToken);  // Previously threw DataErrorException
}

The fix ensures async operations remain async throughout the decompression pipeline, preventing sync-over-async patterns.

Original prompt

This section details on the original issue you should resolve

<issue_title>decompressing big .7z file throws error</issue_title>
<issue_description>lib version 0.42.1
under .net 10

code:

public class SharpCompressExtractor : IArchiveExtractor
{
    public async Task<IReadOnlyCollection<FileInfo>> ExtractAsync(
        string archivePath,
        string destinationDirectory,
        CancellationToken token)
    {
        if (!File.Exists(archivePath))
        {
            throw new FileNotFoundException($"Nie znaleziono archiwum: {archivePath}");
        }

        var extractedFiles = new List<FileInfo>();

        using var archive = ArchiveFactory.Open(archivePath);

        foreach (var entry in archive.Entries)
        {
            if (entry.IsDirectory)
            {
                continue;
            }

            token.ThrowIfCancellationRequested();

            var targetPath = Path.Combine(destinationDirectory, entry.Key);

            var targetDir = Path.GetDirectoryName(targetPath);

            if (!string.IsNullOrEmpty(targetDir) && !Directory.Exists(targetDir))
            {
                Directory.CreateDirectory(targetDir);
            }

            using var sourceStream = await entry.OpenEntryStreamAsync(token);

            var fileOptions = new FileStreamOptions
            {
                Mode = FileMode.Create,
                Access = FileAccess.Write,
                Share = FileShare.None,
                BufferSize = 1_024 * 1_024,
                Options = FileOptions.Asynchronous
            };

            await using var targetStream = new FileStream(targetPath, fileOptions);

            await sourceStream.CopyToAsync(targetStream, token);

            var fileInfo = new FileInfo(targetPath);

            extractedFiles.Add(fileInfo);
        }

        return extractedFiles;
    }
}

usage

            try
            {
                var files = await fileExtractor.ExtractAsync(
    "C:\\whitelist\\20251223.7z",
    "c:/whitelist/out",
    CancellationToken.None);
            }
            catch (Exception ex)
            {
                
            }

Data Error

at SharpCompress.Compressors.LZMA.LzmaStream.d__52.MoveNext()
at SharpCompress.Compressors.LZMA.LzmaStream.d__56.MoveNext()
at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1.ConfiguredValueTaskAwaiter.GetResult()
at SharpCompress.IO.ReadOnlySubStream.d__24.MoveNext()
at System.IO.Stream.<g__Core|30_0>d.MoveNext()
at SharpCompressExtractor.d__0.MoveNext() in SharpCompressExtractor.cs

file : probably any from https://www.podatki.gov.pl/narzedzia/white-list/

but for sure exist for this url: https://plikplaski.mf.gov.pl/pliki//20251223.7z

this file perfectly uncompress under win11 using native extractor or 7zip manager
</issue_description>

<agent_instructions>compare the synchronous code path to the new asynchronous code path to reproduce the issue</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@adamhathcock Sounds like the fix I did for https://github.com//pull/1081

I can validate next week though.</comment_new>
<comment_new>@adamhathcock
You're right....testing this myself the async path is broken. However, the sync path works.

If you change CopyToAsync to CopytTo then it works for me

gonna look at a real fix though</comment_new>


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits January 6, 2026 14:13
- Implemented ReadAsync and RefillCacheAsync methods in BufferedSubStream
- Added async test cases for SevenZipArchive (LZMA, LZMA2, Solid, BZip2, PPMd)
- Tests show LZMA, BZip2, and PPMd working correctly
- LZMA2 and Solid archives still failing with Data Error - investigating cache state management

Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
The issue was that .NET 10's ReadExactlyAsync calls the Memory<byte> overload of ReadAsync, which wasn't implemented in BufferedSubStream. This caused it to fall back to the base Stream implementation that uses synchronous reads, leading to cache state corruption.

Solution: Added ValueTask<int> ReadAsync(Memory<byte>, CancellationToken) overload for modern .NET versions.

All tests now passing including LZMA2 and Solid archives.

Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix error when decompressing big .7z file Fix async decompression of .7z files by implementing Memory<byte> ReadAsync overload Jan 6, 2026
Copilot AI requested a review from adamhathcock January 6, 2026 14:25
@adamhathcock adamhathcock marked this pull request as ready for review January 6, 2026 15:33
Copilot AI review requested due to automatic review settings January 6, 2026 15:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug where async extraction of .7z files with LZMA/LZMA2 compression threw DataErrorException, while synchronous extraction worked correctly. The root cause was that BufferedSubStream only implemented the legacy byte[] ReadAsync overload, causing .NET 6+ to fall back to synchronous reads when ReadExactlyAsync called the Memory<byte> overload. This corrupted cache state when LZMA's RangeCoder mixed synchronous ReadByte() calls with async operations.

The fix implements true async I/O by adding ReadAsync(Memory<byte>, CancellationToken) overloads throughout the decompression pipeline:

  • Added ReadAsync(Memory<byte>) and RefillCacheAsync() to BufferedSubStream for async cache management
  • Added ReadAsync(Memory<byte>) to LzmaStream with proper async decompression handling
  • Enhanced LzOutWindow with Memory support and converted async helpers from Task to ValueTask
  • Added comprehensive async test coverage for LZMA, LZMA2, Solid, BZip2, and PPMd archives

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/SharpCompress.Test/SevenZip/SevenZipArchiveAsyncTests.cs New test file adding async extraction tests for all 7z compression types (LZMA, LZMA2, Solid, BZip2, PPMd)
src/SharpCompress/IO/BufferedSubStream.cs Added Memory ReadAsync overload and RefillCacheAsync for true async buffered reads
src/SharpCompress/IO/ReadOnlySubStream.cs Minor refactoring of ReadAsync Memory implementation
src/SharpCompress/Compressors/LZMA/LzmaStream.cs Added Memory ReadAsync overload; changed DecodeChunkHeaderAsync return type to ValueTask
src/SharpCompress/Compressors/LZMA/LzmaDecoder.cs Changed CodeAsync return type from Task to ValueTask for better performance
src/SharpCompress/Compressors/LZMA/LZ/LzOutWindow.cs Added Read(Memory) overload and converted async methods to ValueTask; optimized async pattern

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

decompressing big .7z file throws error

2 participants

Comments