Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/libstore-tests/nar-info.cc
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include "nix/store/path-info.hh"
#include "nix/store/nar-info.hh"

#include "nix/util/compression-algo.hh"
#include "nix/util/tests/characterization.hh"
#include "nix/store/tests/libstore.hh"

Expand Down Expand Up @@ -65,7 +66,7 @@ static NarInfo makeNarInfo(const Store & store, bool includeImpureInfo)
};

info.url = "nar/1w1fff338fvdw53sqgamddn1b2xgds473pv6y13gizdbqjv4i5p3.nar.xz";
info.compression = "xz";
info.compression = CompressionAlgo::xz;
info.fileHash = Hash::parseSRI("sha256-FePFYIlMuycIXPZbWi7LGEiMmZSX9FMbaQenWBzm1Sc=");
info.fileSize = 4029176;
}
Expand Down
2 changes: 1 addition & 1 deletion src/libstore/binary-cache-store.cc
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ ref<const ValidPathInfo> BinaryCacheStore::addToStoreCommon(

auto info = mkInfo(narHashSink.finish());
auto narInfo = make_ref<NarInfo>(info);
narInfo->compression = config.compression.to_string(); // FIXME: Make NarInfo use CompressionAlgo
narInfo->compression = config.compression;
auto [fileHash, fileSize] = fileHashSink.finish();
narInfo->fileHash = fileHash;
narInfo->fileSize = fileSize;
Expand Down
4 changes: 3 additions & 1 deletion src/libstore/builtins/fetchurl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#include "nix/store/store-api.hh"
#include "nix/store/globals.hh"
#include "nix/util/archive.hh"
#include "nix/util/compression-algo.hh"
#include "nix/util/compression.hh"
#include "nix/util/file-system.hh"

Expand Down Expand Up @@ -54,7 +55,8 @@ static void builtinFetchurl(const BuiltinBuilderContext & ctx)
}
#endif

auto decompressor = makeDecompressionSink(unpack && hasSuffix(mainUrl, ".xz") ? "xz" : "none", sink);
auto decompressor = makeDecompressionSink(
unpack && hasSuffix(mainUrl, ".xz") ? CompressionAlgo::xz : CompressionAlgo::none, sink);
fileTransfer->download(std::move(request), *decompressor);
decompressor->finish();
});
Expand Down
9 changes: 5 additions & 4 deletions src/libstore/filetransfer.cc
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#include "nix/store/filetransfer.hh"
#include "nix/store/globals.hh"
#include "nix/util/compression-algo.hh"
#include "nix/util/config-global.hh"
#include "nix/store/store-api.hh"
#include "nix/util/compression.hh"
Expand Down Expand Up @@ -106,7 +107,7 @@ struct curlFileTransfer : public FileTransfer

curlSList requestHeaders;

std::string encoding;
std::optional<CompressionAlgo> encoding;

bool acceptRanges = false;

Expand Down Expand Up @@ -288,7 +289,7 @@ struct curlFileTransfer : public FileTransfer
result.bodySize = 0;
statusMsg = trim(match.str(1));
acceptRanges = false;
encoding = "";
encoding = std::nullopt;
appendCurrentUrl();
} else {

Expand All @@ -312,7 +313,7 @@ struct curlFileTransfer : public FileTransfer
}

else if (name == "content-encoding")
encoding = trim(line.substr(i + 1));
encoding = parseCompressionAlgo(trim(line.substr(i + 1)));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xokdvium, I'm planning to create a function here that parses the content-encoding and returns the CompressionAlgo enum.

Is there an exhaustive list of all the content-encoding? I only see 3 tokens being used in the RFC spec

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's all quite messy, and we are certainly not spec compliant here – we'd want to tighten up our handling here. Good news is: there are not so many non-standard encodings that we'd want to continue supporting.

Basically (all case insensitive parsing): gzip, x-gzip, compress, x-compress (those are in the spec), deflate (doesn't seem like we can support this via libarchive), br, zstd. As for non-standard ones, we can probably stick with supporting bzip2, since it seemingly did appear in practice. xz and everything should probably error out (but we do for whatever reason send it in Accept-Encoding - that was clearly a mistake). If we want to be extra cautious, we can probably keep accepting xz and issue a warning.

Everything else also mostly doesnt work, because we currently error out if libarchive tries to decompress/compress something that hasnt been linked with appropriate dependencies (it tries to fall back to an external program, issues a warning and we treat it as an error).

Stacked encodings when multiple filters are applied (like Content-Encoding: gzip, deflate) should also error out, but can think about supporting those in the future. That will require having multiple chained decompression sinks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also identity. none and empty strings should be rejected - those are the compression algo encodings used by the NAR compression algorithms.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I’ll open another PR to parse Content-Encoding (case-insensitive) and support:
gzip, x-gzip, compress, x-compress, br, zstd, bzip2, and identity.

I’ll error out on unknown encodings and reject stacked encodings for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That you so much! This has been on my radar for a while! Thanks for tackling it ❤️


else if (name == "accept-ranges" && toLower(trim(line.substr(i + 1))) == "bytes")
acceptRanges = true;
Expand Down Expand Up @@ -738,7 +739,7 @@ struct curlFileTransfer : public FileTransfer
sink, we can only retry if the server supports
ranged requests. */
if (err == Transient && attempt < request.tries
&& (!this->request.dataCallback || writtenToSink == 0 || (acceptRanges && encoding.empty()))) {
&& (!this->request.dataCallback || writtenToSink == 0 || (acceptRanges && !encoding.has_value()))) {
int ms = retryTimeMs
* std::pow(
2.0f, attempt - 1 + std::uniform_real_distribution<>(0.0, 0.5)(fileTransfer.mt19937));
Expand Down
3 changes: 2 additions & 1 deletion src/libstore/include/nix/store/nar-info.hh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ struct StoreDirConfig;
struct UnkeyedNarInfo : virtual UnkeyedValidPathInfo
{
std::string url;
std::string compression; // FIXME: Use CompressionAlgo

std::optional<CompressionAlgo> compression;
std::optional<Hash> fileHash;
uint64_t fileSize = 0;

Expand Down
3 changes: 2 additions & 1 deletion src/libstore/local-fs-store.cc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#include "nix/util/archive.hh"
#include "nix/util/compression-algo.hh"
#include "nix/util/posix-source-accessor.hh"
#include "nix/store/store-api.hh"
#include "nix/store/local-fs-store.hh"
Expand Down Expand Up @@ -130,7 +131,7 @@ std::optional<std::string> LocalFSStore::getBuildLogExact(const StorePath & path

else if (pathExists(logBz2Path)) {
try {
return decompress("bzip2", readFile(logBz2Path));
return decompress(CompressionAlgo::bzip2, readFile(logBz2Path));
} catch (Error &) {
}
}
Expand Down
6 changes: 4 additions & 2 deletions src/libstore/nar-info-disk-cache.cc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#include "nix/store/nar-info-disk-cache.hh"
#include "nix/util/compression-algo.hh"
#include "nix/util/users.hh"
#include "nix/util/sync.hh"
#include "nix/store/sqlite.hh"
Expand Down Expand Up @@ -269,7 +270,7 @@ struct NarInfoDiskCacheImpl : NarInfoDiskCache
auto narInfo = make_ref<NarInfo>(
cache.storeDir, StorePath(hashPart + "-" + namePart), Hash::parseAnyPrefixed(queryNAR.getStr(6)));
narInfo->url = queryNAR.getStr(2);
narInfo->compression = queryNAR.getStr(3);
narInfo->compression = parseCompressionAlgo(queryNAR.getStr(3));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of caution, shouldn't we continue treating empty strings as none. I'm pretty sure the current version of the sqlite db could never have empty strings there, but can't be too careful with these things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe drop a FIXME so that we could remove that later?

if (!queryNAR.isNull(4))
narInfo->fileHash = Hash::parseAnyPrefixed(queryNAR.getStr(4));
narInfo->fileSize = queryNAR.getInt(5);
Expand Down Expand Up @@ -334,7 +335,8 @@ struct NarInfoDiskCacheImpl : NarInfoDiskCache

state->insertNAR
.use()(cache.id)(hashPart) (std::string(info->path.name()))(
narInfo ? narInfo->url : "", narInfo != 0)(narInfo ? narInfo->compression : "", narInfo != 0)(
narInfo ? narInfo->url : "",
narInfo != 0)(narInfo ? showCompressionAlgo(narInfo->compression.value()) : "", narInfo != 0)(
narInfo && narInfo->fileHash ? narInfo->fileHash->to_string(HashFormat::Nix32, true) : "",
narInfo && narInfo->fileHash)(
narInfo ? narInfo->fileSize : 0, narInfo != 0 && narInfo->fileSize)(info->narHash.to_string(
Expand Down
23 changes: 14 additions & 9 deletions src/libstore/nar-info.cc
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
#include "nix/store/globals.hh"
#include "nix/store/nar-info.hh"
#include "nix/store/store-api.hh"
#include "nix/util/compression-algo.hh"
#include "nix/util/strings.hh"
#include "nix/util/json-utils.hh"
#include <optional>

namespace nix {

Expand Down Expand Up @@ -52,7 +54,7 @@ NarInfo::NarInfo(const StoreDirConfig & store, const std::string & s, const std:
} else if (name == "URL")
url = value;
else if (name == "Compression")
compression = value;
compression = value.empty() ? std::nullopt : std::make_optional(parseCompressionAlgo(value));
else if (name == "FileHash")
fileHash = parseHashField(value);
else if (name == "FileSize") {
Expand Down Expand Up @@ -90,8 +92,8 @@ NarInfo::NarInfo(const StoreDirConfig & store, const std::string & s, const std:
line += 1;
}

if (compression == "")
compression = "bzip2";
if (!compression.has_value())
compression = CompressionAlgo::bzip2;

if (!havePath || !haveNarHash || url.empty() || narSize == 0) {
line = 0; // don't include line information in the error
Expand All @@ -109,8 +111,8 @@ std::string NarInfo::to_string(const StoreDirConfig & store) const
std::string res;
res += "StorePath: " + store.printStorePath(path) + "\n";
res += "URL: " + url + "\n";
assert(compression != "");
res += "Compression: " + compression + "\n";
assert(compression.has_value());
res += "Compression: " + showCompressionAlgo(compression.value()) + "\n";
assert(fileHash && fileHash->algo == HashAlgorithm::SHA256);
res += "FileHash: " + fileHash->to_string(HashFormat::Nix32, true) + "\n";
res += "FileSize: " + std::to_string(fileSize) + "\n";
Expand Down Expand Up @@ -142,8 +144,8 @@ UnkeyedNarInfo::toJSON(const StoreDirConfig * store, bool includeImpureInfo, Pat
if (includeImpureInfo) {
if (!url.empty())
jsonObject["url"] = url;
if (!compression.empty())
jsonObject["compression"] = compression;
if (compression.has_value())
jsonObject["compression"] = showCompressionAlgo(compression.value());
if (fileHash) {
if (format == PathInfoJsonFormat::V1)
jsonObject["downloadHash"] = fileHash->to_string(HashFormat::SRI, true);
Expand All @@ -170,8 +172,11 @@ UnkeyedNarInfo UnkeyedNarInfo::fromJSON(const StoreDirConfig * store, const nloh
if (auto * url = get(obj, "url"))
res.url = getString(*url);

if (auto * compression = get(obj, "compression"))
res.compression = getString(*compression);
if (auto * compression = get(obj, "compression")) {
auto compression_value = getString(*compression);
res.compression =
compression_value.empty() ? std::nullopt : std::make_optional(parseCompressionAlgo(compression_value));
}

if (auto * downloadHash = get(obj, "downloadHash")) {
if (format == PathInfoJsonFormat::V1)
Expand Down
15 changes: 8 additions & 7 deletions src/libutil-tests/compression.cc
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ TEST(compress, noneMethodDoesNothingToTheInput)

TEST(decompress, decompressNoneCompressed)
{
auto method = "none";

auto method = CompressionAlgo::none;
auto str = "slfja;sljfklsa;jfklsjfkl;sdjfkl;sadjfkl;sdjf;lsdfjsadlf";
auto o = decompress(method, str);

Expand All @@ -27,7 +28,7 @@ TEST(decompress, decompressEmptyCompressed)
{
// Empty-method decompression used e.g. by S3 store
// (Content-Encoding == "").
auto method = "";
auto method = CompressionAlgo::none; // Do we handle this in S3 store???
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about cases where we have Content-Encoding as Empty string? Do we want to map it to CompressionAlgo::none?

auto str = "slfja;sljfklsa;jfklsjfkl;sdjfkl;sadjfkl;sdjf;lsdfjsadlf";
auto o = decompress(method, str);

Expand All @@ -36,7 +37,7 @@ TEST(decompress, decompressEmptyCompressed)

TEST(decompress, decompressXzCompressed)
{
auto method = "xz";
auto method = CompressionAlgo::xz;
auto str = "slfja;sljfklsa;jfklsjfkl;sdjfkl;sadjfkl;sdjf;lsdfjsadlf";
auto o = decompress(method, compress(CompressionAlgo::xz, str));

Expand All @@ -45,7 +46,7 @@ TEST(decompress, decompressXzCompressed)

TEST(decompress, decompressBzip2Compressed)
{
auto method = "bzip2";
auto method = CompressionAlgo::bzip2;
auto str = "slfja;sljfklsa;jfklsjfkl;sdjfkl;sadjfkl;sdjf;lsdfjsadlf";
auto o = decompress(method, compress(CompressionAlgo::bzip2, str));

Expand All @@ -54,7 +55,7 @@ TEST(decompress, decompressBzip2Compressed)

TEST(decompress, decompressBrCompressed)
{
auto method = "br";
auto method = CompressionAlgo::brotli;
auto str = "slfja;sljfklsa;jfklsjfkl;sdjfkl;sadjfkl;sdjf;lsdfjsadlf";
auto o = decompress(method, compress(CompressionAlgo::brotli, str));

Expand All @@ -63,7 +64,7 @@ TEST(decompress, decompressBrCompressed)

TEST(decompress, decompressInvalidInputThrowsCompressionError)
{
auto method = "bzip2";
auto method = CompressionAlgo::bzip2;
auto str = "this is a string that does not qualify as valid bzip2 data";

ASSERT_THROW(decompress(method, str), CompressionError);
Expand All @@ -88,7 +89,7 @@ TEST(makeCompressionSink, compressAndDecompress)
{
StringSink strSink;
auto inputString = "slfja;sljfklsa;jfklsjfkl;sdjfkl;sadjfkl;sdjf;lsdfjsadlf";
auto decompressionSink = makeDecompressionSink("bzip2", strSink);
auto decompressionSink = makeDecompressionSink(CompressionAlgo::bzip2, strSink);
auto sink = makeCompressionSink(CompressionAlgo::bzip2, *decompressionSink);

(*sink)(inputString);
Expand Down
13 changes: 7 additions & 6 deletions src/libutil/compression.cc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#include "nix/util/compression.hh"
#include "nix/util/compression-algo.hh"
#include "nix/util/signals.hh"
#include "nix/util/tarfile.hh"
#include "nix/util/finally.hh"
Expand Down Expand Up @@ -38,9 +39,9 @@ struct ArchiveDecompressionSource : Source
{
std::unique_ptr<TarArchive> archive = 0;
Source & src;
std::optional<std::string> compressionMethod;
std::optional<CompressionAlgo> compressionMethod;

ArchiveDecompressionSource(Source & src, std::optional<std::string> compressionMethod = std::nullopt)
ArchiveDecompressionSource(Source & src, std::optional<CompressionAlgo> compressionMethod = std::nullopt)
: src(src)
, compressionMethod(std::move(compressionMethod))
{
Expand Down Expand Up @@ -239,7 +240,7 @@ struct BrotliDecompressionSink : ChunkedCompressionSink
}
};

std::string decompress(const std::string & method, std::string_view in)
std::string decompress(const std::optional<CompressionAlgo> & method, std::string_view in)
{
StringSink ssink;
auto sink = makeDecompressionSink(method, ssink);
Expand All @@ -248,11 +249,11 @@ std::string decompress(const std::string & method, std::string_view in)
return std::move(ssink.s);
}

std::unique_ptr<FinishSink> makeDecompressionSink(const std::string & method, Sink & nextSink)
std::unique_ptr<FinishSink> makeDecompressionSink(const std::optional<CompressionAlgo> & method, Sink & nextSink)
{
if (method == "none" || method == "" || method == "identity")
if (!method.has_value() || method == CompressionAlgo::none)
return std::make_unique<NoneSink>(nextSink);
else if (method == "br")
else if (method == CompressionAlgo::brotli)
return std::make_unique<BrotliDecompressionSink>(nextSink);
else
return sourceToSink([method, &nextSink](Source & source) {
Expand Down
1 change: 1 addition & 0 deletions src/libutil/include/nix/util/compression-algo.hh
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

namespace nix {

// Do we want to add Identity to the list???
#define NIX_FOR_EACH_COMPRESSION_ALGO(MACRO) \
Copy link
Author

@Hrushi20 Hrushi20 Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add Identity as part of CompressionAlgo enum?

Content-Encoding might contain identity as part of http header.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thing is that Content-Encoding should be parsed by a very separate function. It's specified by HTTP spec (or rather refers to another RFC), is case-insensitive and has a much more limited range of supported values + some legacy ones like x-gzip. I was going to rewrite this code to better handle Content-Encodinf in a more compliant way. Better to not use the parsing logic here for now. I suppose I can put up a PR to address that soon and we can rebase your one on top of that?

Or could you hold off on replacing strings with enums in the filetransfer code for now?

Copy link
Author

@Hrushi20 Hrushi20 Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be a major change or is it something I can look into and implement? If you guide me on any resources, I can take a shot at it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be pretty small - it would just take some of your changes from this PR for the decompression sink.

MACRO("none", none) \
MACRO("br", brotli) \
Expand Down
5 changes: 3 additions & 2 deletions src/libutil/include/nix/util/compression.hh
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
#include "nix/util/serialise.hh"
#include "nix/util/compression-algo.hh"

#include <optional>
#include <string>

namespace nix {
Expand All @@ -17,9 +18,9 @@ struct CompressionSink : BufferedSink, FinishSink
using FinishSink::finish;
};

std::string decompress(const std::string & method, std::string_view in);
std::string decompress(const std::optional<CompressionAlgo> & method, std::string_view in);

std::unique_ptr<FinishSink> makeDecompressionSink(const std::string & method, Sink & nextSink);
std::unique_ptr<FinishSink> makeDecompressionSink(const std::optional<CompressionAlgo> & method, Sink & nextSink);

std::string compress(CompressionAlgo method, std::string_view in, const bool parallel = false, int level = -1);

Expand Down
3 changes: 2 additions & 1 deletion src/libutil/include/nix/util/tarfile.hh
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#pragma once
///@file

#include "nix/util/compression-algo.hh"
#include "nix/util/serialise.hh"
#include "nix/util/fs-sink.hh"
#include <archive.h>
Expand All @@ -22,7 +23,7 @@ struct TarArchive
/// @param raw - Whether to enable raw file support. For more info look in docs:
/// https://manpages.debian.org/stretch/libarchive-dev/archive_read_format.3.en.html
/// @param compression_method - Primary compression method to use. std::nullopt means 'all'.
TarArchive(Source & source, bool raw = false, std::optional<std::string> compression_method = std::nullopt);
TarArchive(Source & source, bool raw = false, std::optional<CompressionAlgo> compression_method = std::nullopt);

/// Disable copy constructor. Explicitly default move assignment/constructor.
TarArchive(const TarArchive &) = delete;
Expand Down
9 changes: 6 additions & 3 deletions src/libutil/tarfile.cc
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#include <archive.h>
#include <archive_entry.h>
#include <optional>

#include "nix/util/compression-algo.hh"
#include "nix/util/finally.hh"
#include "nix/util/serialise.hh"
#include "nix/util/tarfile.hh"
Expand Down Expand Up @@ -57,11 +59,12 @@ void TarArchive::check(int err, const std::string & reason)
/// Instead it's necessary to use this kludge to convert method -> code and
/// then use archive_read_support_filter_by_code. Arguably this is better than
/// hand-rolling the equivalent function that is better implemented in libarchive.
int getArchiveFilterCodeByName(const std::string & method)
int getArchiveFilterCodeByName(const std::optional<CompressionAlgo> & method)
{
auto * ar = archive_write_new();
auto cleanup = Finally{[&ar]() { checkLibArchive(ar, archive_write_close(ar), "failed to close archive: %s"); }};
auto err = archive_write_add_filter_by_name(ar, method.c_str());
auto err = archive_write_add_filter_by_name(
ar, showCompressionAlgo(method.value()).c_str()); // method.value_or(CompressionAlgo::none)
checkLibArchive(ar, err, "failed to get libarchive filter by name: %s");
auto code = archive_filter_code(ar, 0);
return code;
Comment on lines 64 to 70
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, not this whole function should be possible to delete. It was more of a hack anyway.

Copy link
Contributor

@xokdvium xokdvium Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By this I mean that we can replace this with direct calls to archive_read_support_filter_ calls. And get rid of the stringly typed code completely.

Expand All @@ -78,7 +81,7 @@ static void enableSupportedFormats(struct archive * archive)
archive_read_support_format_empty(archive);
}

TarArchive::TarArchive(Source & source, bool raw, std::optional<std::string> compression_method)
TarArchive::TarArchive(Source & source, bool raw, std::optional<CompressionAlgo> compression_method)
: archive{archive_read_new()}
, source{&source}
, buffer(defaultBufferSize)
Expand Down