Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the core of the binding generator. #37

Merged
merged 2 commits into from
Sep 21, 2016
Merged

Conversation

emilio
Copy link
Contributor

@emilio emilio commented Aug 24, 2016

TL;DR: The binding generator is a mess as of right now. At first it was funny
(in a "this is challenging" sense) to improve on it, but this is not
sustainable.

The truth is that the current architecture of the binding generator is a huge
pile of hacks, so these few days I've been working on rewriting it with a few
goals.

  1. Have the hacks as contained and identified as possible. They're sometimes
    needed because how clang exposes the AST, but ideally those hacks are well
    identified and don't interact randomly with each others.
As an example, in the current bindgen when scanning the parameters of a
function that references a struct clones all the struct information, then if
the struct name changes (because we mangle it), everything breaks.
  1. Support extending the bindgen output without having to deal with clang. The
    way I'm aiming to do this is separating completely the parsing stage from
    the code generation one, and providing a single id for each item the binding
    generator provides.

  2. No more random mutation of the internal representation from anywhere. That
    means no more Rc<RefCell<T>>, no more random circular references, no more
    borrow_state... nothing.

  3. No more deduplication of declarations before code generation.

Current bindgen has a stage, called `tag_dup_decl`[1], that takes care of
deduplicating declarations. That's completely buggy, and for C++ it's a
complete mess, since we YOLO modify the world.

I've managed to take rid of this using the clang canonical declaration, and
the definition, to avoid scanning any type/item twice.
  1. Code generation should not modify any internal data structure. It can lookup
    things, traverse whatever it needs, but not modifying randomly.

  2. Each item should have a canonical name, and a single source of mangling
    logic, and that should be computed from the immutable state, at code
    generation.

I've put a few canonical_name stuff in the code generation phase, but it's
still not complete, and should change if I implement namespaces.

Improvements pending until this can land:

  1. Add support for missing core stuff, mainly generating functions (note that
    we parse the signatures for types correctly though), bitfields, generating
    C++ methods.

  2. Add support for the necessary features that were added to work around some
    C++ pitfalls, like opaque types, etc...

  3. Add support for the sugar that Manish added recently.

  4. Optionally (and I guess this can land without it, because basically nobody
    uses it since it's so buggy), bring back namespace support.

These are not completely trivial, but I think I can do them quite easily with
the current architecture.

I'm putting the current state of affairs here as a request for comments...
Note that there are still a few smells I want to eventually
re-redesign, like the ParseError::Recurse thing, but until that happens I'm
way happier with this kind of architecture.

I'm keeping the old parser.rs and gen.rs in tree just for reference while I
code, but they will go away.

@highfive
Copy link

warning Warning warning

  • These commits modify unsafe code. Please review it carefully!

@emilio
Copy link
Contributor Author

emilio commented Aug 24, 2016

cc: @nox, @bholley, @Manishearth, @vvuk, @Yamakaky, any thoughts on this approach?

// a dummy field with pointer-alignment to supress it.
// FIXME: rustc actually generates tons of warnings due to an empty
// repr(C) type, so we just generate a dummy field with pointer
// alignment to supress it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Foo(c_void)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wfm, this is just an accidental comment rewrapping fwiw.

@nox
Copy link
Contributor

nox commented Aug 24, 2016

This looks great!

@Yamakaky
Copy link
Contributor

Yamakaky commented Aug 24, 2016

Yes! The Rc<Borrow> especially looks awful.
While we are at it, it would be cool to check if all the features in https://github.com/Yamakaky/rust-bindgen are present in this fork. That way, we could close the other and use this one. (see #21)
While you are at rewriting a lot of things, you could use https://github.com/KyleMayes/clang-rs.

@Yamakaky
Copy link
Contributor

Also, I made some refactoring myself, it could help you

@emilio
Copy link
Contributor Author

emilio commented Aug 25, 2016

@Yamakaky I planned to focus in the main architecture change.

Using clang-rs vs our hand-rolled thing would be great (never heard of that library before, but looks nice). That being said, this patch is already ~4k lines, and I don't want to make it larger unnecessarily.

Other parts that I don't plan to touch right now is argument parsing and such, feel free to help with it if you feel like it :-)

@Yamakaky
Copy link
Contributor

Maybe clang-rs should be integrated before you make too many changes. It would have been work for nothing.

@emilio
Copy link
Contributor Author

emilio commented Aug 25, 2016

Clang-rs is only a wrapper for types like Cursor or Type, it doesn't significantly change the architecture.

@Yamakaky
Copy link
Contributor

(inviting @KyleMayes in the conversation)

@emilio emilio force-pushed the v2 branch 2 times, most recently from 344a41f to 524cae9 Compare August 26, 2016 11:30
@emilio
Copy link
Contributor Author

emilio commented Aug 26, 2016

Current status: all the tests produce compilable code, I've been able to generate bindings for gecko without crashing, parsing all the relevant stl stuff and mfbt (obviously, the resulting code didn't compile due to sfinae and similar stuff).

I hope to be able to fully support c++ namespaces, though I'm going to keep working in getting the features up to date with current bindgen before doing more tests like that.

@bors-servo
Copy link

☔ The latest upstream changes (presumably #44) made this pull request unmergeable. Please resolve the merge conflicts.

@emilio emilio force-pushed the v2 branch 3 times, most recently from 532fd14 to de9eb58 Compare September 2, 2016 06:30
@emilio
Copy link
Contributor Author

emilio commented Sep 2, 2016

Hey, I've heard over there that this passes some tests already :-)

@emilio emilio force-pushed the v2 branch 2 times, most recently from f7adf26 to c2aaa08 Compare September 12, 2016 23:59
@emilio emilio force-pushed the v2 branch 4 times, most recently from 5a9b443 to 34a748a Compare September 13, 2016 19:29
// fn clone(&self) -> Self { *self }
// doesn't work for templates.
//
// It's not hard to fix though.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File an easy issue? :)

constness: ast::Constness::NotConst,
};

// TODO: This needs to be kept in sync, so we'd be better unifying this,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept in sync with what?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the other loop that decides argument naming. Clarified it in the comment.

fn collect_types(&self,
context: &BindgenContext,
types: &mut ItemSet,
extra: &Self::Extra);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler to pass Self::Extra by value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought so, but that implies either parameterizing the trait and type on a lifetime and whatnot, or pass Items by value, which is both inefficient and not the intended thing. I tried to keep this simple since extra uses to be just the Item that refers to us, but if that changes we can refactor the interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

}
}

fn contains_parent(ctx: &BindgenContext, types: &ItemSet, id: ItemId) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshed: is_recursive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... This doesn't check for any kind of recursion, this only checks that a given parent of ours isn't already whitelisted, since generating code for it will generate code for us anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

self.bitfield
}

pub fn mutable(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshed: is_mutable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure :)

}
}

pub fn hide(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshed: is_hidden?

self.hide
}

pub fn opaque(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshed: is_opaque?

self.opaque
}

pub fn use_as(&self) -> Option<&str> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to implement the replaces annotation. I've given it a more meaningful name and a doc comment :)

self.use_as.as_ref().map(|s| &**s)
}

pub fn no_copy(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this method, and the name should probably not be a negation.

self.no_copy
}

pub fn private_fields(&self) -> Option<bool> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshed: has_private_fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not indicate whether a type has private fields, but whether it should generate private fields. I'll give it a meaningful name.

Copy link
Contributor

@nox nox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started the review, but I am far from finishing it.

@emilio
Copy link
Contributor Author

emilio commented Sep 15, 2016

I addressed a bunch of review comments, and cleaned up a bit more. Pushed as to-squash commits to ease the review.

Copy link
Contributor

@nox nox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much to say about it now, the changes look correct but it's not like I can check every single line in details.

Squash and I'll r+.

/// This is done due to aster's lack of pointer builder, I guess I should PR
/// there.
trait ToPtr {
fn to_ptr(self, is_const: bool, span: Span) -> P<ast::Ty>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

fn collect_types(&self,
context: &BindgenContext,
types: &mut ItemSet,
extra: &Self::Extra);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

}
}

fn contains_parent(ctx: &BindgenContext, types: &ItemSet, id: ItemId) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

TL;DR: The binding generator is a mess as of right now. At first it was funny
(in a "this is challenging" sense) to improve on it, but this is not
sustainable.

The truth is that the current architecture of the binding generator is a huge
pile of hacks, so these few days I've been working on rewriting it with a few
goals.

 1) Have the hacks as contained and identified as possible. They're sometimes
    needed because how clang exposes the AST, but ideally those hacks are well
    identified and don't interact randomly with each others.

    As an example, in the current bindgen when scanning the parameters of a
    function that references a struct clones all the struct information, then if
    the struct name changes (because we mangle it), everything breaks.

 2) Support extending the bindgen output without having to deal with clang. The
    way I'm aiming to do this is separating completely the parsing stage from
    the code generation one, and providing a single id for each item the binding
    generator provides.

 3) No more random mutation of the internal representation from anywhere. That
    means no more Rc<RefCell<T>>, no more random circular references, no more
    borrow_state... nothing.

 4) No more deduplication of declarations before code generation.

    Current bindgen has a stage, called `tag_dup_decl`[1], that takes care of
    deduplicating declarations. That's completely buggy, and for C++ it's a
    complete mess, since we YOLO modify the world.

    I've managed to take rid of this using the clang canonical declaration, and
    the definition, to avoid scanning any type/item twice.

 5) Code generation should not modify any internal data structure. It can lookup
    things, traverse whatever it needs, but not modifying randomly.

 6) Each item should have a canonical name, and a single source of mangling
    logic, and that should be computed from the inmutable state, at code
    generation.

    I've put a few canonical_name stuff in the code generation phase, but it's
    still not complete, and should change if I implement namespaces.

Improvements pending until this can land:

 1) Add support for missing core stuff, mainly generating functions (note that
    we parse the signatures for types correctly though), bitfields, generating
    C++ methods.

 2) Add support for the necessary features that were added to work around some
    C++ pitfalls, like opaque types, etc...

 3) Add support for the sugar that Manish added recently.

 4) Optionally (and I guess this can land without it, because basically nobody
    uses it since it's so buggy), bring back namespace support.

These are not completely trivial, but I think I can do them quite easily with
the current architecture.

I'm putting the current state of affairs here as a request for comments... Any
thoughts? Note that there are still a few smells I want to eventually
re-redesign, like the ParseError::Recurse thing, but until that happens I'm
way happier with this kind of architecture.

I'm keeping the old `parser.rs` and `gen.rs` in tree just for reference while I
code, but they will go away.

[1]: https://github.com/Yamakaky/rust-bindgen/blob/master/src/gen.rs#L448
@emilio
Copy link
Contributor Author

emilio commented Sep 21, 2016

review ping @nox, is this good to land?

@nox
Copy link
Contributor

nox commented Sep 21, 2016

@bors-servo r+

@bors-servo
Copy link

📌 Commit da69afd has been approved by nox

@bors-servo
Copy link

⚡ Test exempted - status

@bors-servo bors-servo merged commit da69afd into rust-lang:master Sep 21, 2016
bors-servo pushed a commit that referenced this pull request Sep 21, 2016
Rewrite the core of the binding generator.

TL;DR: The binding generator is a mess as of right now. At first it was funny
(in a "this is challenging" sense) to improve on it, but this is not
sustainable.

The truth is that the current architecture of the binding generator is a huge
pile of hacks, so these few days I've been working on rewriting it with a few
goals.

 1) Have the hacks as contained and identified as possible. They're sometimes
    needed because how clang exposes the AST, but ideally those hacks are well
    identified and don't interact randomly with each others.

    As an example, in the current bindgen when scanning the parameters of a
    function that references a struct clones all the struct information, then if
    the struct name changes (because we mangle it), everything breaks.

 2) Support extending the bindgen output without having to deal with clang. The
    way I'm aiming to do this is separating completely the parsing stage from
    the code generation one, and providing a single id for each item the binding
    generator provides.

 3) No more random mutation of the internal representation from anywhere. That
    means no more `Rc<RefCell<T>>`, no more random circular references, no more
    borrow_state... nothing.

 4) No more deduplication of declarations before code generation.

    Current bindgen has a stage, called `tag_dup_decl`[1], that takes care of
    deduplicating declarations. That's completely buggy, and for C++ it's a
    complete mess, since we YOLO modify the world.

    I've managed to take rid of this using the clang canonical declaration, and
    the definition, to avoid scanning any type/item twice.

 5) Code generation should not modify any internal data structure. It can lookup
    things, traverse whatever it needs, but not modifying randomly.

 6) Each item should have a canonical name, and a single source of mangling
    logic, and that should be computed from the immutable state, at code
    generation.

    I've put a few canonical_name stuff in the code generation phase, but it's
    still not complete, and should change if I implement namespaces.

Improvements pending until this can land:

 1) Add support for missing core stuff, mainly generating functions (note that
    we parse the signatures for types correctly though), bitfields, generating
    C++ methods.

 2) Add support for the necessary features that were added to work around some
    C++ pitfalls, like opaque types, etc...

 3) Add support for the sugar that Manish added recently.

 4) Optionally (and I guess this can land without it, because basically nobody
    uses it since it's so buggy), bring back namespace support.

These are not completely trivial, but I think I can do them quite easily with
the current architecture.

I'm putting the current state of affairs here as a request for comments...
 Note that there are still a few smells I want to eventually
re-redesign, like the ParseError::Recurse thing, but until that happens I'm
way happier with this kind of architecture.

I'm keeping the old `parser.rs` and `gen.rs` in tree just for reference while I
code, but they will go away.

[1]: https://github.com/Yamakaky/rust-bindgen/blob/master/src/gen.rs#L448
@emilio emilio mentioned this pull request Sep 22, 2016
@emilio emilio deleted the v2 branch September 22, 2016 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants