-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite the core of the binding generator. #37
Conversation
cc: @nox, @bholley, @Manishearth, @vvuk, @Yamakaky, any thoughts on this approach? |
// a dummy field with pointer-alignment to supress it. | ||
// FIXME: rustc actually generates tons of warnings due to an empty | ||
// repr(C) type, so we just generate a dummy field with pointer | ||
// alignment to supress it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Foo(c_void)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wfm, this is just an accidental comment rewrapping fwiw.
This looks great! |
Yes! The |
Also, I made some refactoring myself, it could help you |
@Yamakaky I planned to focus in the main architecture change. Using clang-rs vs our hand-rolled thing would be great (never heard of that library before, but looks nice). That being said, this patch is already ~4k lines, and I don't want to make it larger unnecessarily. Other parts that I don't plan to touch right now is argument parsing and such, feel free to help with it if you feel like it :-) |
Maybe clang-rs should be integrated before you make too many changes. It would have been work for nothing. |
Clang-rs is only a wrapper for types like |
(inviting @KyleMayes in the conversation) |
344a41f
to
524cae9
Compare
Current status: all the tests produce compilable code, I've been able to generate bindings for gecko without crashing, parsing all the relevant stl stuff and mfbt (obviously, the resulting code didn't compile due to sfinae and similar stuff). I hope to be able to fully support c++ namespaces, though I'm going to keep working in getting the features up to date with current bindgen before doing more tests like that. |
☔ The latest upstream changes (presumably #44) made this pull request unmergeable. Please resolve the merge conflicts. |
532fd14
to
de9eb58
Compare
Hey, I've heard over there that this passes some tests already :-) |
f7adf26
to
c2aaa08
Compare
5a9b443
to
34a748a
Compare
// fn clone(&self) -> Self { *self } | ||
// doesn't work for templates. | ||
// | ||
// It's not hard to fix though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File an easy issue? :)
constness: ast::Constness::NotConst, | ||
}; | ||
|
||
// TODO: This needs to be kept in sync, so we'd be better unifying this, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kept in sync with what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the other loop that decides argument naming. Clarified it in the comment.
fn collect_types(&self, | ||
context: &BindgenContext, | ||
types: &mut ItemSet, | ||
extra: &Self::Extra); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be simpler to pass Self::Extra
by value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I thought so, but that implies either parameterizing the trait and type on a lifetime and whatnot, or pass Item
s by value, which is both inefficient and not the intended thing. I tried to keep this simple since extra
uses to be just the Item
that refers to us, but if that changes we can refactor the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
} | ||
} | ||
|
||
fn contains_parent(ctx: &BindgenContext, types: &ItemSet, id: ItemId) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshed: is_recursive
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... This doesn't check for any kind of recursion, this only checks that a given parent of ours isn't already whitelisted, since generating code for it will generate code for us anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
self.bitfield | ||
} | ||
|
||
pub fn mutable(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshed: is_mutable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure :)
} | ||
} | ||
|
||
pub fn hide(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshed: is_hidden
?
self.hide | ||
} | ||
|
||
pub fn opaque(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshed: is_opaque
?
self.opaque | ||
} | ||
|
||
pub fn use_as(&self) -> Option<&str> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used to implement the replaces
annotation. I've given it a more meaningful name and a doc comment :)
self.use_as.as_ref().map(|s| &**s) | ||
} | ||
|
||
pub fn no_copy(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this method, and the name should probably not be a negation.
self.no_copy | ||
} | ||
|
||
pub fn private_fields(&self) -> Option<bool> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshed: has_private_fields
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not indicate whether a type has private fields, but whether it should generate private fields. I'll give it a meaningful name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started the review, but I am far from finishing it.
I addressed a bunch of review comments, and cleaned up a bit more. Pushed as to-squash commits to ease the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have much to say about it now, the changes look correct but it's not like I can check every single line in details.
Squash and I'll r+.
/// This is done due to aster's lack of pointer builder, I guess I should PR | ||
/// there. | ||
trait ToPtr { | ||
fn to_ptr(self, is_const: bool, span: Span) -> P<ast::Ty>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
fn collect_types(&self, | ||
context: &BindgenContext, | ||
types: &mut ItemSet, | ||
extra: &Self::Extra); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
} | ||
} | ||
|
||
fn contains_parent(ctx: &BindgenContext, types: &ItemSet, id: ItemId) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
TL;DR: The binding generator is a mess as of right now. At first it was funny (in a "this is challenging" sense) to improve on it, but this is not sustainable. The truth is that the current architecture of the binding generator is a huge pile of hacks, so these few days I've been working on rewriting it with a few goals. 1) Have the hacks as contained and identified as possible. They're sometimes needed because how clang exposes the AST, but ideally those hacks are well identified and don't interact randomly with each others. As an example, in the current bindgen when scanning the parameters of a function that references a struct clones all the struct information, then if the struct name changes (because we mangle it), everything breaks. 2) Support extending the bindgen output without having to deal with clang. The way I'm aiming to do this is separating completely the parsing stage from the code generation one, and providing a single id for each item the binding generator provides. 3) No more random mutation of the internal representation from anywhere. That means no more Rc<RefCell<T>>, no more random circular references, no more borrow_state... nothing. 4) No more deduplication of declarations before code generation. Current bindgen has a stage, called `tag_dup_decl`[1], that takes care of deduplicating declarations. That's completely buggy, and for C++ it's a complete mess, since we YOLO modify the world. I've managed to take rid of this using the clang canonical declaration, and the definition, to avoid scanning any type/item twice. 5) Code generation should not modify any internal data structure. It can lookup things, traverse whatever it needs, but not modifying randomly. 6) Each item should have a canonical name, and a single source of mangling logic, and that should be computed from the inmutable state, at code generation. I've put a few canonical_name stuff in the code generation phase, but it's still not complete, and should change if I implement namespaces. Improvements pending until this can land: 1) Add support for missing core stuff, mainly generating functions (note that we parse the signatures for types correctly though), bitfields, generating C++ methods. 2) Add support for the necessary features that were added to work around some C++ pitfalls, like opaque types, etc... 3) Add support for the sugar that Manish added recently. 4) Optionally (and I guess this can land without it, because basically nobody uses it since it's so buggy), bring back namespace support. These are not completely trivial, but I think I can do them quite easily with the current architecture. I'm putting the current state of affairs here as a request for comments... Any thoughts? Note that there are still a few smells I want to eventually re-redesign, like the ParseError::Recurse thing, but until that happens I'm way happier with this kind of architecture. I'm keeping the old `parser.rs` and `gen.rs` in tree just for reference while I code, but they will go away. [1]: https://github.com/Yamakaky/rust-bindgen/blob/master/src/gen.rs#L448
review ping @nox, is this good to land? |
@bors-servo r+ |
📌 Commit da69afd has been approved by |
⚡ Test exempted - status |
Rewrite the core of the binding generator. TL;DR: The binding generator is a mess as of right now. At first it was funny (in a "this is challenging" sense) to improve on it, but this is not sustainable. The truth is that the current architecture of the binding generator is a huge pile of hacks, so these few days I've been working on rewriting it with a few goals. 1) Have the hacks as contained and identified as possible. They're sometimes needed because how clang exposes the AST, but ideally those hacks are well identified and don't interact randomly with each others. As an example, in the current bindgen when scanning the parameters of a function that references a struct clones all the struct information, then if the struct name changes (because we mangle it), everything breaks. 2) Support extending the bindgen output without having to deal with clang. The way I'm aiming to do this is separating completely the parsing stage from the code generation one, and providing a single id for each item the binding generator provides. 3) No more random mutation of the internal representation from anywhere. That means no more `Rc<RefCell<T>>`, no more random circular references, no more borrow_state... nothing. 4) No more deduplication of declarations before code generation. Current bindgen has a stage, called `tag_dup_decl`[1], that takes care of deduplicating declarations. That's completely buggy, and for C++ it's a complete mess, since we YOLO modify the world. I've managed to take rid of this using the clang canonical declaration, and the definition, to avoid scanning any type/item twice. 5) Code generation should not modify any internal data structure. It can lookup things, traverse whatever it needs, but not modifying randomly. 6) Each item should have a canonical name, and a single source of mangling logic, and that should be computed from the immutable state, at code generation. I've put a few canonical_name stuff in the code generation phase, but it's still not complete, and should change if I implement namespaces. Improvements pending until this can land: 1) Add support for missing core stuff, mainly generating functions (note that we parse the signatures for types correctly though), bitfields, generating C++ methods. 2) Add support for the necessary features that were added to work around some C++ pitfalls, like opaque types, etc... 3) Add support for the sugar that Manish added recently. 4) Optionally (and I guess this can land without it, because basically nobody uses it since it's so buggy), bring back namespace support. These are not completely trivial, but I think I can do them quite easily with the current architecture. I'm putting the current state of affairs here as a request for comments... Note that there are still a few smells I want to eventually re-redesign, like the ParseError::Recurse thing, but until that happens I'm way happier with this kind of architecture. I'm keeping the old `parser.rs` and `gen.rs` in tree just for reference while I code, but they will go away. [1]: https://github.com/Yamakaky/rust-bindgen/blob/master/src/gen.rs#L448
TL;DR: The binding generator is a mess as of right now. At first it was funny
(in a "this is challenging" sense) to improve on it, but this is not
sustainable.
The truth is that the current architecture of the binding generator is a huge
pile of hacks, so these few days I've been working on rewriting it with a few
goals.
needed because how clang exposes the AST, but ideally those hacks are well
identified and don't interact randomly with each others.
Support extending the bindgen output without having to deal with clang. The
way I'm aiming to do this is separating completely the parsing stage from
the code generation one, and providing a single id for each item the binding
generator provides.
No more random mutation of the internal representation from anywhere. That
means no more
Rc<RefCell<T>>
, no more random circular references, no moreborrow_state... nothing.
No more deduplication of declarations before code generation.
Code generation should not modify any internal data structure. It can lookup
things, traverse whatever it needs, but not modifying randomly.
Each item should have a canonical name, and a single source of mangling
logic, and that should be computed from the immutable state, at code
generation.
Improvements pending until this can land:
Add support for missing core stuff, mainly generating functions (note that
we parse the signatures for types correctly though), bitfields, generating
C++ methods.
Add support for the necessary features that were added to work around some
C++ pitfalls, like opaque types, etc...
Add support for the sugar that Manish added recently.
Optionally (and I guess this can land without it, because basically nobody
uses it since it's so buggy), bring back namespace support.
These are not completely trivial, but I think I can do them quite easily with
the current architecture.
I'm putting the current state of affairs here as a request for comments...
Note that there are still a few smells I want to eventually
re-redesign, like the ParseError::Recurse thing, but until that happens I'm
way happier with this kind of architecture.
I'm keeping the old
parser.rs
andgen.rs
in tree just for reference while Icode, but they will go away.