Rewrite the core of the binding generator. #37

emilio · 2016-08-24T09:22:29Z

TL;DR: The binding generator is a mess as of right now. At first it was funny
(in a "this is challenging" sense) to improve on it, but this is not
sustainable.

The truth is that the current architecture of the binding generator is a huge
pile of hacks, so these few days I've been working on rewriting it with a few
goals.

Have the hacks as contained and identified as possible. They're sometimes
needed because how clang exposes the AST, but ideally those hacks are well
identified and don't interact randomly with each others.

As an example, in the current bindgen when scanning the parameters of a
function that references a struct clones all the struct information, then if
the struct name changes (because we mangle it), everything breaks.

Support extending the bindgen output without having to deal with clang. The
way I'm aiming to do this is separating completely the parsing stage from
the code generation one, and providing a single id for each item the binding
generator provides.
No more random mutation of the internal representation from anywhere. That
means no more Rc<RefCell<T>>, no more random circular references, no more
borrow_state... nothing.
No more deduplication of declarations before code generation.

Current bindgen has a stage, called `tag_dup_decl`[1], that takes care of
deduplicating declarations. That's completely buggy, and for C++ it's a
complete mess, since we YOLO modify the world.

I've managed to take rid of this using the clang canonical declaration, and
the definition, to avoid scanning any type/item twice.

Code generation should not modify any internal data structure. It can lookup
things, traverse whatever it needs, but not modifying randomly.
Each item should have a canonical name, and a single source of mangling
logic, and that should be computed from the immutable state, at code
generation.

I've put a few canonical_name stuff in the code generation phase, but it's
still not complete, and should change if I implement namespaces.

Improvements pending until this can land:

Add support for missing core stuff, mainly generating functions (note that
we parse the signatures for types correctly though), bitfields, generating
C++ methods.
Add support for the necessary features that were added to work around some
C++ pitfalls, like opaque types, etc...
Add support for the sugar that Manish added recently.
Optionally (and I guess this can land without it, because basically nobody
uses it since it's so buggy), bring back namespace support.

These are not completely trivial, but I think I can do them quite easily with
the current architecture.

I'm putting the current state of affairs here as a request for comments...
Note that there are still a few smells I want to eventually
re-redesign, like the ParseError::Recurse thing, but until that happens I'm
way happier with this kind of architecture.

I'm keeping the old parser.rs and gen.rs in tree just for reference while I
code, but they will go away.

highfive · 2016-08-24T09:22:34Z

Warning

These commits modify unsafe code. Please review it carefully!

emilio · 2016-08-24T09:23:15Z

cc: @nox, @bholley, @Manishearth, @vvuk, @Yamakaky, any thoughts on this approach?

nox · 2016-08-24T09:45:44Z

src/gen.rs

-        // a dummy field with pointer-alignment to supress it.
+        // FIXME: rustc actually generates tons of warnings due to an empty
+        // repr(C) type, so we just generate a dummy field with pointer
+        // alignment to supress it.


What about Foo(c_void)?

Wfm, this is just an accidental comment rewrapping fwiw.

nox · 2016-08-24T09:46:38Z

This looks great!

Yamakaky · 2016-08-24T12:25:22Z

Yes! The Rc<Borrow> especially looks awful.
While we are at it, it would be cool to check if all the features in https://github.com/Yamakaky/rust-bindgen are present in this fork. That way, we could close the other and use this one. (see #21)
While you are at rewriting a lot of things, you could use https://github.com/KyleMayes/clang-rs.

Yamakaky · 2016-08-24T12:40:55Z

Also, I made some refactoring myself, it could help you

emilio · 2016-08-25T08:12:27Z

@Yamakaky I planned to focus in the main architecture change.

Using clang-rs vs our hand-rolled thing would be great (never heard of that library before, but looks nice). That being said, this patch is already ~4k lines, and I don't want to make it larger unnecessarily.

Other parts that I don't plan to touch right now is argument parsing and such, feel free to help with it if you feel like it :-)

Yamakaky · 2016-08-25T09:06:44Z

Maybe clang-rs should be integrated before you make too many changes. It would have been work for nothing.

emilio · 2016-08-25T17:15:01Z

Clang-rs is only a wrapper for types like Cursor or Type, it doesn't significantly change the architecture.

Yamakaky · 2016-08-25T17:38:45Z

(inviting @KyleMayes in the conversation)

emilio · 2016-08-26T11:34:07Z

Current status: all the tests produce compilable code, I've been able to generate bindings for gecko without crashing, parsing all the relevant stl stuff and mfbt (obviously, the resulting code didn't compile due to sfinae and similar stuff).

I hope to be able to fully support c++ namespaces, though I'm going to keep working in getting the features up to date with current bindgen before doing more tests like that.

bors-servo · 2016-08-27T07:12:26Z

☔ The latest upstream changes (presumably #44) made this pull request unmergeable. Please resolve the merge conflicts.

emilio · 2016-09-02T06:34:38Z

Hey, I've heard over there that this passes some tests already :-)

nox · 2016-09-15T09:03:00Z

src/codegen/mod.rs

+                //     fn clone(&self) -> Self { *self }
+                // doesn't work for templates.
+                //
+                // It's not hard to fix though.


File an easy issue? :)

nox · 2016-09-15T09:03:29Z

src/codegen/mod.rs

+            constness: ast::Constness::NotConst,
+        };
+
+        // TODO: This needs to be kept in sync, so we'd be better unifying this,


Kept in sync with what?

With the other loop that decides argument naming. Clarified it in the comment.

nox · 2016-09-15T09:05:54Z

src/codegen/mod.rs

+    fn collect_types(&self,
+                     context: &BindgenContext,
+                     types: &mut ItemSet,
+                     extra: &Self::Extra);


Wouldn't it be simpler to pass Self::Extra by value?

At first I thought so, but that implies either parameterizing the trait and type on a lifetime and whatnot, or pass Items by value, which is both inefficient and not the intended thing. I tried to keep this simple since extra uses to be just the Item that refers to us, but if that changes we can refactor the interface.

nox · 2016-09-15T09:06:32Z

src/codegen/mod.rs

+                }
+            }
+
+            fn contains_parent(ctx: &BindgenContext, types: &ItemSet, id: ItemId) -> bool {


Bikeshed: is_recursive?

Hmm... This doesn't check for any kind of recursion, this only checks that a given parent of ours isn't already whitelisted, since generating code for it will generate code for us anyway.

nox · 2016-09-15T09:07:53Z

src/ir/comp.rs

+        self.bitfield
+    }
+
+    pub fn mutable(&self) -> bool {


Bikeshed: is_mutable?

nox · 2016-09-15T09:08:16Z

src/ir/annotations.rs

+        }
+    }
+
+    pub fn hide(&self) -> bool {


Bikeshed: is_hidden?

nox · 2016-09-15T09:08:42Z

src/ir/annotations.rs

+        self.hide
+    }
+
+    pub fn opaque(&self) -> bool {


Bikeshed: is_opaque?

nox · 2016-09-15T09:09:01Z

src/ir/annotations.rs

+        self.opaque
+    }
+
+    pub fn use_as(&self) -> Option<&str> {


I don't understand this method.

This is used to implement the replaces annotation. I've given it a more meaningful name and a doc comment :)

nox · 2016-09-15T09:09:28Z

src/ir/annotations.rs

+        self.use_as.as_ref().map(|s| &**s)
+    }
+
+    pub fn no_copy(&self) -> bool {


I don't understand this method, and the name should probably not be a negation.

nox · 2016-09-15T09:09:48Z

src/ir/annotations.rs

+        self.no_copy
+    }
+
+    pub fn private_fields(&self) -> Option<bool> {


Bikeshed: has_private_fields?

This does not indicate whether a type has private fields, but whether it should generate private fields. I'll give it a meaningful name.

nox

Started the review, but I am far from finishing it.

emilio · 2016-09-15T16:54:51Z

I addressed a bunch of review comments, and cleaned up a bit more. Pushed as to-squash commits to ease the review.

nox

I don't have much to say about it now, the changes look correct but it's not like I can check every single line in details.

Squash and I'll r+.

nox · 2016-09-16T09:50:54Z

src/codegen/mod.rs

+/// This is done due to aster's lack of pointer builder, I guess I should PR
+/// there.
+trait ToPtr {
+    fn to_ptr(self, is_const: bool, span: Span) -> P<ast::Ty>;


nox · 2016-09-16T09:52:14Z

src/codegen/mod.rs

+    fn collect_types(&self,
+                     context: &BindgenContext,
+                     types: &mut ItemSet,
+                     extra: &Self::Extra);


nox · 2016-09-16T09:52:55Z

src/codegen/mod.rs

+                }
+            }
+
+            fn contains_parent(ctx: &BindgenContext, types: &ItemSet, id: ItemId) -> bool {


TL;DR: The binding generator is a mess as of right now. At first it was funny (in a "this is challenging" sense) to improve on it, but this is not sustainable. The truth is that the current architecture of the binding generator is a huge pile of hacks, so these few days I've been working on rewriting it with a few goals. 1) Have the hacks as contained and identified as possible. They're sometimes needed because how clang exposes the AST, but ideally those hacks are well identified and don't interact randomly with each others. As an example, in the current bindgen when scanning the parameters of a function that references a struct clones all the struct information, then if the struct name changes (because we mangle it), everything breaks. 2) Support extending the bindgen output without having to deal with clang. The way I'm aiming to do this is separating completely the parsing stage from the code generation one, and providing a single id for each item the binding generator provides. 3) No more random mutation of the internal representation from anywhere. That means no more Rc<RefCell<T>>, no more random circular references, no more borrow_state... nothing. 4) No more deduplication of declarations before code generation. Current bindgen has a stage, called `tag_dup_decl`[1], that takes care of deduplicating declarations. That's completely buggy, and for C++ it's a complete mess, since we YOLO modify the world. I've managed to take rid of this using the clang canonical declaration, and the definition, to avoid scanning any type/item twice. 5) Code generation should not modify any internal data structure. It can lookup things, traverse whatever it needs, but not modifying randomly. 6) Each item should have a canonical name, and a single source of mangling logic, and that should be computed from the inmutable state, at code generation. I've put a few canonical_name stuff in the code generation phase, but it's still not complete, and should change if I implement namespaces. Improvements pending until this can land: 1) Add support for missing core stuff, mainly generating functions (note that we parse the signatures for types correctly though), bitfields, generating C++ methods. 2) Add support for the necessary features that were added to work around some C++ pitfalls, like opaque types, etc... 3) Add support for the sugar that Manish added recently. 4) Optionally (and I guess this can land without it, because basically nobody uses it since it's so buggy), bring back namespace support. These are not completely trivial, but I think I can do them quite easily with the current architecture. I'm putting the current state of affairs here as a request for comments... Any thoughts? Note that there are still a few smells I want to eventually re-redesign, like the ParseError::Recurse thing, but until that happens I'm way happier with this kind of architecture. I'm keeping the old `parser.rs` and `gen.rs` in tree just for reference while I code, but they will go away. [1]: https://github.com/Yamakaky/rust-bindgen/blob/master/src/gen.rs#L448

emilio · 2016-09-21T08:54:30Z

review ping @nox, is this good to land?

nox · 2016-09-21T09:44:08Z

@bors-servo r+

bors-servo · 2016-09-21T09:44:09Z

📌 Commit da69afd has been approved by nox

bors-servo · 2016-09-21T09:44:12Z

⚡ Test exempted - status

Rewrite the core of the binding generator. TL;DR: The binding generator is a mess as of right now. At first it was funny (in a "this is challenging" sense) to improve on it, but this is not sustainable. The truth is that the current architecture of the binding generator is a huge pile of hacks, so these few days I've been working on rewriting it with a few goals. 1) Have the hacks as contained and identified as possible. They're sometimes needed because how clang exposes the AST, but ideally those hacks are well identified and don't interact randomly with each others. As an example, in the current bindgen when scanning the parameters of a function that references a struct clones all the struct information, then if the struct name changes (because we mangle it), everything breaks. 2) Support extending the bindgen output without having to deal with clang. The way I'm aiming to do this is separating completely the parsing stage from the code generation one, and providing a single id for each item the binding generator provides. 3) No more random mutation of the internal representation from anywhere. That means no more `Rc<RefCell<T>>`, no more random circular references, no more borrow_state... nothing. 4) No more deduplication of declarations before code generation. Current bindgen has a stage, called `tag_dup_decl`[1], that takes care of deduplicating declarations. That's completely buggy, and for C++ it's a complete mess, since we YOLO modify the world. I've managed to take rid of this using the clang canonical declaration, and the definition, to avoid scanning any type/item twice. 5) Code generation should not modify any internal data structure. It can lookup things, traverse whatever it needs, but not modifying randomly. 6) Each item should have a canonical name, and a single source of mangling logic, and that should be computed from the immutable state, at code generation. I've put a few canonical_name stuff in the code generation phase, but it's still not complete, and should change if I implement namespaces. Improvements pending until this can land: 1) Add support for missing core stuff, mainly generating functions (note that we parse the signatures for types correctly though), bitfields, generating C++ methods. 2) Add support for the necessary features that were added to work around some C++ pitfalls, like opaque types, etc... 3) Add support for the sugar that Manish added recently. 4) Optionally (and I guess this can land without it, because basically nobody uses it since it's so buggy), bring back namespace support. These are not completely trivial, but I think I can do them quite easily with the current architecture. I'm putting the current state of affairs here as a request for comments... Note that there are still a few smells I want to eventually re-redesign, like the ParseError::Recurse thing, but until that happens I'm way happier with this kind of architecture. I'm keeping the old `parser.rs` and `gen.rs` in tree just for reference while I code, but they will go away. [1]: https://github.com/Yamakaky/rust-bindgen/blob/master/src/gen.rs#L448

highfive added the S-awaiting-review label Aug 24, 2016

nox reviewed Aug 24, 2016
View reviewed changes

emilio force-pushed the v2 branch from 16f2c95 to 341ccee Compare August 25, 2016 04:04

emilio mentioned this pull request Aug 25, 2016

Document -blacklist-type SomeType #40

Closed

emilio force-pushed the v2 branch from 612d2ee to 98f098c Compare August 25, 2016 08:05

emilio force-pushed the v2 branch 2 times, most recently from 344a41f to 524cae9 Compare August 26, 2016 11:30

highfive added the S-needs-rebase label Aug 27, 2016

emilio force-pushed the v2 branch 3 times, most recently from 532fd14 to de9eb58 Compare September 2, 2016 06:30

emilio force-pushed the v2 branch 2 times, most recently from f7adf26 to c2aaa08 Compare September 12, 2016 23:59

highfive removed the S-needs-rebase label Sep 12, 2016

emilio force-pushed the v2 branch 4 times, most recently from 5a9b443 to 34a748a Compare September 13, 2016 19:29

nox reviewed Sep 15, 2016

View reviewed changes

src/ir/annotations.rs

}

}

pub fn hide(&self) -> bool {

Copy link

Contributor

nox Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshed: is_hidden?

nox reviewed Sep 15, 2016

View reviewed changes

src/ir/annotations.rs

self.hide

}

pub fn opaque(&self) -> bool {

Copy link

Contributor

nox Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshed: is_opaque?

nox reviewed Sep 15, 2016

View reviewed changes

nox suggested changes Sep 15, 2016

View reviewed changes

emilio force-pushed the v2 branch from 2bdba43 to b68a40a Compare September 15, 2016 16:53

nox approved these changes Sep 16, 2016

View reviewed changes

emilio added 2 commits September 16, 2016 11:34

Back out docopt.

da69afd

emilio force-pushed the v2 branch from b68a40a to da69afd Compare September 16, 2016 19:43

highfive added S-awaiting-merge and removed S-awaiting-review labels Sep 21, 2016

bors-servo merged commit da69afd into rust-lang:master Sep 21, 2016

highfive removed the S-awaiting-merge label Sep 21, 2016

emilio mentioned this pull request Sep 22, 2016

docopt takes forever #46

Closed

emilio deleted the v2 branch September 22, 2016 02:04

emilio mentioned this pull request Nov 22, 2016

Option for only generating bindings for filenames matching a pattern #303

Closed

Manishearth unassigned nox Jul 25, 2017

Rewrite the core of the binding generator. #37

Rewrite the core of the binding generator. #37

Conversation

emilio commented Aug 24, 2016 • edited by Ms2ger Loading

highfive commented Aug 24, 2016

emilio commented Aug 24, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nox commented Aug 24, 2016

Yamakaky commented Aug 24, 2016 • edited Loading

Yamakaky commented Aug 24, 2016

emilio commented Aug 25, 2016

Yamakaky commented Aug 25, 2016

emilio commented Aug 25, 2016

Yamakaky commented Aug 25, 2016

emilio commented Aug 26, 2016

bors-servo commented Aug 27, 2016

emilio commented Sep 2, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nox left a comment

Choose a reason for hiding this comment

emilio commented Sep 15, 2016

nox left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emilio commented Sep 21, 2016

nox commented Sep 21, 2016

bors-servo commented Sep 21, 2016

bors-servo commented Sep 21, 2016

emilio commented Aug 24, 2016 •

edited by Ms2ger

Loading

Yamakaky commented Aug 24, 2016 •

edited

Loading