Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protoc-gen-go: unexport XXX_ fields from generated types #276

Closed
bcmills opened this issue Jan 11, 2017 · 29 comments
Closed

protoc-gen-go: unexport XXX_ fields from generated types #276

bcmills opened this issue Jan 11, 2017 · 29 comments

Comments

@bcmills
Copy link
Contributor

bcmills commented Jan 11, 2017

The exported XXX_ fields on generated message structs are awkward. They make "obvious" reflection code over proto messages easy to get wrong (by not skipping the XXX_ fields), and they pollute the documentation of the generated packages with irrelevant implementation details.

I believe it is possible to unexport them. They appear to need to be exported due to the restriction on reflect.Value.Interface of unexported fields, but we can observe that every nested proto.Message is reached through only exported fields or XXX_ fields.

One technique we could use to access the fields after unexporting them is double-embedding. Start with an exported type in the proto package with an unexported (pointer) accessor method. Add one (unexported) struct type per generated proto package which embeds the exported struct. Embed this unexported struct in each message which needs the corresponding field.

The unexported struct field is itself unexported, but since that embeds a struct with an unexported method defined in the proto package it is accessible within proto via type-assertion to an interface containing the method.

A rough sketch: https://play.golang.org/p/yDjZZtGmB6

@bcmills
Copy link
Contributor Author

bcmills commented Jan 11, 2017

@cherrymui

@bcmills
Copy link
Contributor Author

bcmills commented Jan 11, 2017

(Note that the accessor code could be significantly simpler if golang/go#18617 is accepted.)

@rsc
Copy link
Contributor

rsc commented Jan 17, 2017

That works if we were using v.Interface() but I think most of the time we are using v.Field(i).Addr().UnsafePointer() or v.Field(i).Set(foo), and reflect will reject those.

@rsc
Copy link
Contributor

rsc commented Jan 17, 2017

More generally, I do think this is an interesting thing to keep in mind, but we should also figure out where we are headed. For example, hypothetically, if we move to completely generated code, then the XXX_ fields can be xxx_ fields directly, with no problem. And again, hypothetically, if we move to completely unsafe code (no safe fallback for App Engine), the XXX_ fields can be xxx_ fields directly too, since we can write to them using unsafe.Pointer arithmetic.

@bcmills
Copy link
Contributor Author

bcmills commented Jan 17, 2017

I think most of the time we are using v.Field(i).Addr().UnsafePointer() or v.Field(i).Set(foo), and reflect will reject those.

We can use v.Interface() with UnsafePointer fairly efficiently: we just need a fast concurrent map (golang/go#18177) and a slow-path for the first read of a given message type:

func unsafeGetUnrecognized(v reflect.Value) (*XXX_InternalUnrecognized, bool) {
	if v.IsNil() {
		return nil, false
	}

	offset, ok := unrecOffsets.Load(v.Type())
	if ok {
		addr := uintptr(v.Addr().UnsafePointer()) + offset.(uintptr)
		return (*XXX_InternalUnrecognized)(unsafe.Pointer(addr))
	}

	w, ok := v.Interface().(interface {
		unrecognized() *XXX_InternalUnrecognized
	})
	if !ok {
		return nil, false
	}

	p := w.unrecognized()
	offset = uintptr(unsafe.Pointer(p)) - uintptr(v.Addr().UnsafePointer())
	unrecOffsets.LoadOrStore(v.Type(), offset)
	return p, true
}

That said, it's not obvious to me that that's even necessary: if we're already paying the overhead of reflect.Value and using unsafe, it's likely we can optimize the lookup in some other way.

The Set case is trivial: since getUnrecognized returns a pointer, we write it as:

p, _ := getUnrecognized(v.Interface())
*p = foo

@bcmills
Copy link
Contributor Author

bcmills commented Jan 17, 2017

More generally, I do think this is an interesting thing to keep in mind, but we should also figure out where we are headed.

Agreed. I think we should unexport these fields no matter what, but the implementation details depend heavily on how we decide to optimize the rest of the package.

@awalterschulze
Copy link

Is there a possibility of golang/protobuf starting to generate some marshaling and unmarshaling code?
I would recommend this over any use of unsafe, since then protocol buffers doesn't just play nice with appengine, but also gopherjs.
Also generated code is much faster, which is one of the reasons people flock to gogoprotobuf.
It would also be great if the generated code didn't depend on the proto package.
But this is just my personal wish, one less dependency for users of a package using protobufs.

@rsc
Copy link
Contributor

rsc commented Jan 18, 2017

@awalterschulze We're looking into ways to speed up marshal and unmarshal. Entirely generated code seems unlikely to be the right end point (too much code, not enough performance win), but we're evaluating multiple points in the solution space. Code not depending on the proto package at all is an interesting idea, although that implies even more generated code. I don't think that was on our radar.

@awalterschulze
Copy link

I am moving the conversation to another issue, since I feel I am hijacking the current issue.

@bcmills
Copy link
Contributor Author

bcmills commented Jan 18, 2017

@awalterschulze It's at least partly relevant to this discussion, because cutting the dependency on the proto package would imply that we can't use the double-embedding trick I proposed to unexport the extension and unrecognized fields. We'd need to figure out some other strategy for implementing those features, or else commit to keeping the implementation-detail leakage in the API.

@awalterschulze
Copy link

I know at least that the double embedding tricked has caused me a lot of work when it was done for extensions. So I would opt for leaving this as is from a purely selfish point of view.
Changing this publicly available fieldname definitely breaks backwards compatibility. At least for me, I will definitely have to change quite a few code generation plugins :)

But thats probably not a good enough reason not to do this.

@bcmills
Copy link
Contributor Author

bcmills commented Jan 18, 2017

Changing this publicly available fieldname definitely breaks backwards compatibility.

That's true, but note that these fields are already excluded from the compatibility guidelines:

[W]e reserve the right to make breaking changes without notice for the following reasons:
[…]

  • Adding, removing, or changing methods or fields in generated structs that start with XXX. These parts of the generated code are exported out of necessity, but should not be considered part of the public API.

So pushing third-party protobuf tools to use only the stable part of the API is probably a good thing anyway.

@awalterschulze
Copy link

Yes fair enough. I'll do the work :)

Only using the compatible parts is not possible for all cases. To do better marshaling and unmarshaling testing I would like to be able to generate messages with populated XXX_recognized fields. So my plugin to generate randomly populated messages, needs to be able access that field.
The Equal method generator also needs access to this field for more obvious reasons.
The GoString method plugin also needs to print out this field.
The generated marshaler and unmarshaler also needs to access this field.
etc.

@bcmills
Copy link
Contributor Author

bcmills commented Jan 18, 2017

To do better marshaling and unmarshaling testing I would like to be able to generate messages with populated XXX_recognized fields. So my plugin to generate randomly populated messages, needs to be able access that field.

Couldn't you populate the exported fields, marshal the message, and inject pre-serialized data for unrecognized fields, and then unmarshal the message? It's an extra marshal/unmarshal round-trip, but that doesn't seem terrible for a test utility and allows you to work with only the exported API (and the documented proto encoding spec).

The Equal method generator also needs access to this field for more obvious reasons.
The GoString method plugin also needs to print out this field.
The generated marshaler and unmarshaler also needs to access this field.

All of those are implementation details, though: they're features that are normally internal to functions in the proto package.

Also note that with the double-embedding approach, code within the generated packages would still have access to the (unexported) second-embedding struct and its fields. So if we're careful in how we define the first-embedding API, it may still be possible for gogoprotobuf generated methods to access that data.

I would like to see the proto package move toward a stronger split between the public API and generated-code hooks. (That will hopefully be more feasible once golang/go#18130 is addressed.) With that approach, I could imagine moving the XXX fields and their types out to exported types in a generated and/or internal package instead of making them unexported types within proto. The generated or internal package could export functions to efficiently access the fields hidden by double-embedding. That would allow for (unstable) hooks for tools like gogoprotobuf to explicitly access the internal API but without polluting the documentation for the (stable) public API.

@awalterschulze
Copy link

awalterschulze commented Jan 18, 2017

The current way allows me to inject those values on several levels in deeply nested messages.

The need for the proto package is a sore point for me and one that I would like to remove in future, at least as an option for gogoprotobuf. Even with a limited usecase I still think removing this dependency is of great value in simplifying some installations.

It would be great if the resulting xxx field can still be touched be generated code without going into the proto package.

I would really love to one day have an extension to not import the proto package. This obviously won't work for extensions and any but hopefully for all other usecases.

@bcmills
Copy link
Contributor Author

bcmills commented Jan 18, 2017

The need for the proto package is a sore point for me and one that I would like to remove in future, at least as an option for gogoprotobuf. Even with a limited usecase I still think removing this dependency is of great value in simplifying some installations.

That should be straightforward for proto3, since it uses Any instead of extensions and explicitly discards unknown fields on deserialization. You'd need to elide the Register call (which would break ptypes.DynamicAny), but otherwise it ought to work fine.

It may be possible for proto2 messages that don't reserve extension fields, but we'd need to figure out what to do about unknown fields: proto2 requires that messages preserve unknown fields, but at the moment we don't have an exported, stable API for accessing or manipulating them and I'm not sure that we should.

@awalterschulze
Copy link

Exactly if I just stop Register messages only extensions and any stop working :)

But if this change goes through in such a way that a proto package is needed, then my plan only works for proto3 which is quite a shame.

So If something has to change, them I am for the publically available XXX_ becoming a private xxx_

As I understand XXX_InternalUnrecognized would require the use a proto package?

@rsc
Copy link
Contributor

rsc commented Jan 18, 2017

Why do you want to not have a proto package? It seems like there's an underlying root cause there (like maybe package management) that we should be addressing instead.

@awalterschulze
Copy link

dependencies, given even the best package management, still need to be managed.
Less is more :)

But there is also confusion when using gogoprotobuf and golang/protobuf.
Ideally you would only want to use one and this can typically be done, but when you start having a dependency like grpc, then you are suddenly important golang/protobuf by default. Now if for some reason I decided not to generate code, then I need to make sure that I call the right proto package. I haven't had this problem, but I have run into people who have. And I feel their frustration. This gets even more confusing given two separate jsonpb packages and dealing with grpc-gateway.

Also vendoring a protobuf project has its limits.

@dsnet dsnet changed the title Remove XXX_ fields from generated types. protoc-gen-go: Remove XXX_ fields from generated types. Feb 14, 2018
@dsnet dsnet changed the title protoc-gen-go: Remove XXX_ fields from generated types. protoc-gen-go: remove XXX_ fields from generated types Feb 14, 2018
@dsnet dsnet added the breaking-change requires making a breaking API change label Mar 14, 2018
@wangxx2026
Copy link

latest code occurs, again

@sh0umik
Copy link

sh0umik commented Jul 4, 2018

XXX_ fields are back again ... :D

@dsnet dsnet added the api-v2 label Aug 2, 2018
@dsnet dsnet changed the title protoc-gen-go: remove XXX_ fields from generated types protoc-gen-go: unexport XXX_ fields from generated types Aug 3, 2018
@dsnet
Copy link
Member

dsnet commented Aug 3, 2018

Renaming as "unexported XXX_ fields" since the current title suggests that "XXX_" fields will one day disappear; which is not true. The generator will need to inject some fields necessary for proto functionality. The goal is to have them be unexported and not user-visible either through documentation or by direct field access.

@achandak123
Copy link

Any update on this issue 👍

@james-lawrence
Copy link

this really needs to be resolved. completely makes testing protobuf structs a nightmare.

@dsnet
Copy link
Member

dsnet commented Jun 17, 2019

this really needs to be resolved. completely makes testing protobuf structs a nightmare.

Please explain?

@james-lawrence
Copy link

fields that get populated with data during serialization, are nested, and no clean way to reset them to zero. means that one can't compare them with a structure made in a test without jumping through hoops.

@dsnet
Copy link
Member

dsnet commented Jun 24, 2019

fields that get populated with data during serialization, are nested, and no clean way to reset them to zero. means that one can't compare them with a structure made in a test without jumping through hoops.

That has nothing to do this this issue though. This issue is about unexporting the XXX_ fields, which doesn't change the fact that the internal fields will still be populated; and doing something like reflect.DeepEqual on protobuf messages won't work.

You want something like #762

@james-lawrence
Copy link

it use to work just fine. ;) but yes something like #762 would work.

dsnet added a commit that referenced this issue Jul 8, 2019
We modify protoc-gen-go to stop generating exported XXX fields.
The unsafe implementation is unaffected by this change since unsafe
can access fields regardless of visibility. However, for the purego
implementation, we need to respect Go visibility rules as enforced
by the reflect package.

We work around this by generating a exporter function that given
a reference to the message and the field to export, returns a reference
to the unexported field value. This exporter function is protected by
a constant such that it is not linked into the final binary in non-purego
build environment.

Updates #276

Change-Id: Idf5c1f158973fa1c61187ff41440acb21c5dac94
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/185141
Reviewed-by: Damien Neil <[email protected]>
@dsnet dsnet added pending-v2 and removed breaking-change requires making a breaking API change enhancement labels Jul 8, 2019
@dsnet dsnet added this to the v2 release milestone Aug 21, 2019
@dsnet
Copy link
Member

dsnet commented Mar 3, 2020

The v1.20.0 release of protoc-gen-go no longer generates XXX fields and methods. Internal fields continue to exist, but are now unexported.

@dsnet dsnet closed this as completed Mar 3, 2020
@golang golang locked as resolved and limited conversation to collaborators Jun 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants