Skip to content

Conversation

@Radvendii
Copy link
Contributor

@Radvendii Radvendii commented Sep 26, 2025

⚠️ built on top of #14047 (merged)

Motivation

See this tracking issue for more big-picture motivation.

Where does this fall in that picture?

  • Removes 24/32 btyes per string in the AST (size of an std::string depending on standard library implementation)
    • Saves ~10MB (~1%) memory when evaluating a NixOS configuration
  • Introduces an arena for other Expr-related data to live in
  • Introduces Exprs struct, which will eventually contain arrays for each of the Expr types
  • For indentation strings, allocate the memory needed up-front, rather than repeatedly resizing an std::string to accommodate. We may also have been over-allocating as a result (I'm not sure what the reallocation strategy of std::string is)
  • Once Exprs live in a std::vector, they will need to be trivially moveable. Because of the small string optimization of std::string, moving an ExprString invalidates its Value's pointer.

Note that this does get rid of the small-string optimization built into std::string. That's part of the idea, but it also means for small strings, we now have an extra pointer indirection where there wasn't one before. I didn't find any noticeable slowdown because of this.

I am also opening another PR to add SSO to Values themselves. If that one is accepted, I will modify this one to take advantage of it.

Context


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added new-cli Relating to the "nix" command repl The Read Eval Print Loop, "nix repl" command and debugger c api Nix as a C library with a stable interface labels Sep 26, 2025
@Radvendii Radvendii mentioned this pull request Sep 26, 2025
42 tasks
@Mic92
Copy link
Member

Mic92 commented Sep 26, 2025

Solved merge conflict and incorperated formatting commit.

@Mic92
Copy link
Member

Mic92 commented Sep 26, 2025

ASAN is not happy.

@Mic92
Copy link
Member

Mic92 commented Sep 26, 2025

also reproducible locally.

@Mic92
Copy link
Member

Mic92 commented Sep 26, 2025

cc @NaN-git

@Radvendii
Copy link
Contributor Author

Radvendii commented Sep 26, 2025

Solved merge conflict and incorperated formatting commit.

Did I clobber these again? I pulled before pushing but I don't see your commits 🙈

EDIT: oh never mind, I see you rolled them into the existing commits.

@Radvendii
Copy link
Contributor Author

ASAN is not happy.

Yep.. I fixed the bugs and added asserts to make sure they don't happen sneakily again. (they weren't that sneaky, but I had forgotten to run tests before and was just checking whether a NixOS configuration built)

@Radvendii Radvendii force-pushed the expr-slim branch 3 times, most recently from 0f6732b to 6863c10 Compare September 26, 2025 21:17
assert(c > 0);
s2[c] = '\0';

es2->emplace_back(i->first, new ExprString(s2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether it's possible to allocate the AST nodes with the buffer allocator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is supposed to come next? See the main issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm intending to move the AST nodes into big vectors. How exactly that relates to the buffer allocator isn't clear to me yet since they have to be able to move to accommodate growth.

But indeed I haven't forgotten to consider where the Exprs are going, I'm just trying to do things one small step at a time.

@Radvendii Radvendii force-pushed the expr-slim branch 2 times, most recently from fa9fdad to aeb391f Compare September 28, 2025 11:48
std::vector<nix::AttrName> * attrNames;
std::vector<std::pair<nix::AttrName, nix::PosIdx>> * inheritAttrs;
std::vector<std::pair<nix::PosIdx, nix::Expr *>> * string_parts;
std::variant<nix::Expr *, std::string_view> * to_be_string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this makes me wish we could move to a C++ skeleton in bison. Then we'd be able to use proper variant types for values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what that means "C++ skeleton in bison".

I kind of have an itching to move us to a recursive descent parser, but I'm not sure other people would like that, and in any case that should definitely be after we have a fuzzer that can thoroughly check that they're equivalent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.gnu.org/software/bison/manual/html_node/A-Simple-C_002b_002b-Example.html

Then we wouldn't have to use the %union at all and instead switch to %define api.value.type variant

Copy link
Member

@Mic92 Mic92 Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added now in #14105

1. Saves 24-32 bytes per string (size of std::string)
2. Saves additional bytes by not over-allocating strings (in total we
save ~1% memory)
3. Sets us up to perform a similar transformation on the other Expr
subclasses
4. Makes ExprString trivially moveable (before the string data might
move, causing the Value's pointer to become invalid). This is important
so we can put ExprStrings in an std::vector and refer to them by index

We have introduced a string copy in ParserState::stripIndentation().
This could be removed by pre-allocating the right sized string in the
arena, but this adds complexity and doesn't seem to improve performance,
so for now we've left the copy in.
@Mic92 Mic92 merged commit 7cbc0f9 into NixOS:master Sep 29, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c api Nix as a C library with a stable interface new-cli Relating to the "nix" command repl The Read Eval Print Loop, "nix repl" command and debugger

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants