-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
nixexpr: introduce arena to hold ExprString strings #14090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Solved merge conflict and incorperated formatting commit. |
|
ASAN is not happy. |
|
also reproducible locally. |
|
cc @NaN-git |
Did I clobber these again? I pulled before pushing but I don't see your commits 🙈 EDIT: oh never mind, I see you rolled them into the existing commits. |
Yep.. I fixed the bugs and added asserts to make sure they don't happen sneakily again. (they weren't that sneaky, but I had forgotten to run tests before and was just checking whether a NixOS configuration built) |
0f6732b to
6863c10
Compare
ff3d0ba to
966b855
Compare
| assert(c > 0); | ||
| s2[c] = '\0'; | ||
|
|
||
| es2->emplace_back(i->first, new ExprString(s2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether it's possible to allocate the AST nodes with the buffer allocator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is supposed to come next? See the main issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm intending to move the AST nodes into big vectors. How exactly that relates to the buffer allocator isn't clear to me yet since they have to be able to move to accommodate growth.
But indeed I haven't forgotten to consider where the Exprs are going, I'm just trying to do things one small step at a time.
fa9fdad to
aeb391f
Compare
| std::vector<nix::AttrName> * attrNames; | ||
| std::vector<std::pair<nix::AttrName, nix::PosIdx>> * inheritAttrs; | ||
| std::vector<std::pair<nix::PosIdx, nix::Expr *>> * string_parts; | ||
| std::variant<nix::Expr *, std::string_view> * to_be_string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this makes me wish we could move to a C++ skeleton in bison. Then we'd be able to use proper variant types for values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what that means "C++ skeleton in bison".
I kind of have an itching to move us to a recursive descent parser, but I'm not sure other people would like that, and in any case that should definitely be after we have a fuzzer that can thoroughly check that they're equivalent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.gnu.org/software/bison/manual/html_node/A-Simple-C_002b_002b-Example.html
Then we wouldn't have to use the %union at all and instead switch to %define api.value.type variant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added now in #14105
c18d9b9 to
1db9336
Compare
ca78783 to
3ef9c79
Compare
1. Saves 24-32 bytes per string (size of std::string) 2. Saves additional bytes by not over-allocating strings (in total we save ~1% memory) 3. Sets us up to perform a similar transformation on the other Expr subclasses 4. Makes ExprString trivially moveable (before the string data might move, causing the Value's pointer to become invalid). This is important so we can put ExprStrings in an std::vector and refer to them by index We have introduced a string copy in ParserState::stripIndentation(). This could be removed by pre-allocating the right sized string in the arena, but this adds complexity and doesn't seem to improve performance, so for now we've left the copy in.
3ef9c79 to
eab467e
Compare
⚠️ built on top of #14047(merged)Motivation
See this tracking issue for more big-picture motivation.
Where does this fall in that picture?
std::stringdepending on standard library implementation)Exprsstruct, which will eventually contain arrays for each of theExprtypesFor indentation strings, allocate the memory needed up-front, rather than repeatedly resizing an std::string to accommodate. We may also have been over-allocating as a result (I'm not sure what the reallocation strategy ofstd::stringis)Exprs live in astd::vector, they will need to be trivially moveable. Because of the small string optimization ofstd::string, moving anExprStringinvalidates itsValue's pointer.Note that this does get rid of the small-string optimization built into
std::string. That's part of the idea, but it also means for small strings, we now have an extra pointer indirection where there wasn't one before. I didn't find any noticeable slowdown because of this.I am also opening another PR to add SSO to Values themselves. If that one is accepted, I will modify this one to take advantage of it.
Context
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.