-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add script to re-namespace .proto files for internal use in public libraries #207
Conversation
Hmm....the producer and consumer have to agree on the message content and for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main thought is, if this is useful can we do the following:
- move it to a
tools
folder as opposed to being in the proto folder - add a pre-commit test that ensures that it runs (and proto/buf runs successfully and as expected afterwards).
It blows my mind that this is the best option for some languages...
Yeah, I realized after submitting this that, based on the error message, this is probably caused by different versions of the protobuf library itself in a single link. If that's the case, unfortunately there's bugger-all we can do about it other than recommend against using the official protobuf library... Regardless of this particular error though, message type conflicts between different Substrait versions would give similar errors. Even if the conflict is between two completely different, incompatible Substrait versions, a user still shouldn't be getting errors from protobuf until they try to deserialize a plan generated by an incompatible version. The primary use case here being to support multiple incompatible versions of Substrait at once, or simply just linking a bunch of libraries that happen to link to Substrait under the hood without actually using Substrait.
Unless you mean protobuf itself, AFAIK this applies to every programming language, whenever the official protobuf C library is used under the hood. That includes at least C++ and Python; I wouldn't know about the other ones. I would love to be wrong about this, because it's the most egregious case of "my library is special and should therefore dictate your entire workflow" that I know of.
|
8b2800d
to
a8b6fd6
Compare
@westonpace , you good on this change? |
I'm not really for or against this change. I have no need to import multiple Substrait versions at the same time and, perhaps the validator needs that, but I'm not sure what other use cases there are. I'm still concerned that this approach will not be able of handling google.protobuf.Any which encodes the namespace name as part of the message. As for the general oddities of protobuf, I know we have encountered this with Flight. I think we've avoided this in Arrow with:
Even then, I think users can still run into trouble if they have different versions of Arrow, so I don't know that things are entirely solved. Also, I've been seriously wanting to play around with nanopb to see if it helps with all of this. |
Ah, sorry, I think I misunderstood you before. To paraphrase, you're questioning whether a protobuf Anyway, I'm personally also not 100% sure this is actually needed. I've tried to break libprotobuf here to confirm; it builds a dummy 0.2.0 producer lib, a dummy 0.3.0 consumer lib, and then links them together. However, of course it works fine in those laboratory conditions. Honestly, I'm pretty clueless about what does and doesn't break libprotobuf at this point. All I know is that it's a royal PITA because, every single time I've used libprotobuf in practice, it found a way to break itself in creative ways practically the second I start feeling somewhat confident that I implemented things right for once. The only instance where protobuf has worked for me consistently was for the Rust part of the validator, i.e. when I wasn't using the official library. At this point I'm just trying to make sacrifices to the protobuf gods in the hopes of not being smote by them every other day. I don't know. Maybe we should just close and/or leave this PR be until I or someone else finds a use case where this actually resolves a problem. I'm not sure if it did for @gforsyth; I'd imagine it'd take some refactoring to put it to use, and I think he worked around the problem by forcing Python protobuf to use the pure-python version of the library via the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to put this is a request changes state since it isn't clear whether we should merge it. Let's keep in a PR for now and see what requests/demands we see. Since it is basically independent, people can use it even if it is not merged.
If people are using this, please comment here so we can keep track of demand.
I'm going to bump this up for continued discussion. We updated the bundled version of substrait in I can imagine a very reasonable scenario in which producer A has substrait version 0.8 and serializes a Plan that is simple (say TPC-H query 1) that makes use of message types that were all present in substrait version 0.3. We probably can't break |
|
@gforsyth what's the latest on this in the ibis-substrait world? I seem to recall we did end up breaking |
Hey @westonpace -- I'm pretty happy with it. I've had no further headaches around colliding protobuf definitions. I'm generally a +1 on packages namespacing their substrait protos and this script will do that for you. |
@jvanstraten I think we should merge this in. Do you want to address the conflicts? If not, I think I can do it. |
Adds a script that can rewrite the protobuf files with a namespace prefix, such as arrow.substrait. When public libraries generate their own internal bindings and don't do this, users of said libraries can get nasty surprises from the protobuf library (segfaults, incomprehensible exceptions, or just wrong behavior) if there is even the slightest difference between different instances of the substrait namespace, even if those differences are otherwise just non-breaking changes.
a8b6fd6
to
4d6aeb8
Compare
@westonpace Should be ready to be reviewed & merged again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jacques-n this will need your review to proceed since you put it in a request changes state. I am content to merge this and would like to start closing up some of these outstanding PRs. I think this sort of renaming has seen proven usage in both ibis-substrait and the validator at this point which is enough, in my mind, to at least tacitly recommend the approach by including it in a tools folder.
In this interest of closing this long outstanding issue I am going to bypass github's protections and merge. I believe we have satisfied the requirements in the governance policy given this is not a change in the spec so I think we're just being blocked by Github annoyance at this point. |
ACTION NEEDED Substrait follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
This adds a script that can rewrite the protobuf files with a namespace prefix, such as arrow.substrait. When public libraries generate their own internal bindings and don't do this, users of said libraries can get nasty surprises from the protobuf library (segfaults, incomprehensible exceptions, or just wrong behavior) if there is even the slightest difference between different instances of substrait's message types, even if those differences are otherwise just non-breaking changes. I can't say I understand completely when or why protobuf's global message type registry is shared between libraries, but I'd say it's better to be safe than sorry and to strongly recommend that public libraries only define proto messages within their own namespace. AFAIK protoc can't do this on its own, so having this script makes it relatively easy to do so.
As an example of just how inane protobuf's error messages are, copypasting a Python exception @gforsyth got:
and apparently, swapping the import order "fixes" the error (I'm sure it's still broken, it just doesn't throw).