protoc-gen-go: unexport XXX_ fields from generated types #276

bcmills · 2017-01-11T20:26:30Z

The exported XXX_ fields on generated message structs are awkward. They make "obvious" reflection code over proto messages easy to get wrong (by not skipping the XXX_ fields), and they pollute the documentation of the generated packages with irrelevant implementation details.

I believe it is possible to unexport them. They appear to need to be exported due to the restriction on reflect.Value.Interface of unexported fields, but we can observe that every nested proto.Message is reached through only exported fields or XXX_ fields.

One technique we could use to access the fields after unexporting them is double-embedding. Start with an exported type in the proto package with an unexported (pointer) accessor method. Add one (unexported) struct type per generated proto package which embeds the exported struct. Embed this unexported struct in each message which needs the corresponding field.

The unexported struct field is itself unexported, but since that embeds a struct with an unexported method defined in the proto package it is accessible within proto via type-assertion to an interface containing the method.

A rough sketch: https://play.golang.org/p/yDjZZtGmB6

The text was updated successfully, but these errors were encountered:

bcmills · 2017-01-11T20:37:03Z

@cherrymui

bcmills · 2017-01-11T21:01:07Z

(Note that the accessor code could be significantly simpler if golang/go#18617 is accepted.)

rsc · 2017-01-17T20:03:22Z

That works if we were using v.Interface() but I think most of the time we are using v.Field(i).Addr().UnsafePointer() or v.Field(i).Set(foo), and reflect will reject those.

rsc · 2017-01-17T20:10:30Z

More generally, I do think this is an interesting thing to keep in mind, but we should also figure out where we are headed. For example, hypothetically, if we move to completely generated code, then the XXX_ fields can be xxx_ fields directly, with no problem. And again, hypothetically, if we move to completely unsafe code (no safe fallback for App Engine), the XXX_ fields can be xxx_ fields directly too, since we can write to them using unsafe.Pointer arithmetic.

bcmills · 2017-01-17T20:25:41Z

I think most of the time we are using v.Field(i).Addr().UnsafePointer() or v.Field(i).Set(foo), and reflect will reject those.

We can use v.Interface() with UnsafePointer fairly efficiently: we just need a fast concurrent map (golang/go#18177) and a slow-path for the first read of a given message type:

func unsafeGetUnrecognized(v reflect.Value) (*XXX_InternalUnrecognized, bool) {
	if v.IsNil() {
		return nil, false
	}

	offset, ok := unrecOffsets.Load(v.Type())
	if ok {
		addr := uintptr(v.Addr().UnsafePointer()) + offset.(uintptr)
		return (*XXX_InternalUnrecognized)(unsafe.Pointer(addr))
	}

	w, ok := v.Interface().(interface {
		unrecognized() *XXX_InternalUnrecognized
	})
	if !ok {
		return nil, false
	}

	p := w.unrecognized()
	offset = uintptr(unsafe.Pointer(p)) - uintptr(v.Addr().UnsafePointer())
	unrecOffsets.LoadOrStore(v.Type(), offset)
	return p, true
}

That said, it's not obvious to me that that's even necessary: if we're already paying the overhead of reflect.Value and using unsafe, it's likely we can optimize the lookup in some other way.

The Set case is trivial: since getUnrecognized returns a pointer, we write it as:

p, _ := getUnrecognized(v.Interface())
*p = foo

bcmills · 2017-01-17T20:27:01Z

More generally, I do think this is an interesting thing to keep in mind, but we should also figure out where we are headed.

Agreed. I think we should unexport these fields no matter what, but the implementation details depend heavily on how we decide to optimize the rest of the package.

awalterschulze · 2017-01-18T10:44:48Z

Is there a possibility of golang/protobuf starting to generate some marshaling and unmarshaling code?
I would recommend this over any use of unsafe, since then protocol buffers doesn't just play nice with appengine, but also gopherjs.
Also generated code is much faster, which is one of the reasons people flock to gogoprotobuf.
It would also be great if the generated code didn't depend on the proto package.
But this is just my personal wish, one less dependency for users of a package using protobufs.

rsc · 2017-01-18T16:07:16Z

@awalterschulze We're looking into ways to speed up marshal and unmarshal. Entirely generated code seems unlikely to be the right end point (too much code, not enough performance win), but we're evaluating multiple points in the solution space. Code not depending on the proto package at all is an interesting idea, although that implies even more generated code. I don't think that was on our radar.

awalterschulze · 2017-01-18T16:29:08Z

I am moving the conversation to another issue, since I feel I am hijacking the current issue.

bcmills · 2017-01-18T16:43:51Z

@awalterschulze It's at least partly relevant to this discussion, because cutting the dependency on the proto package would imply that we can't use the double-embedding trick I proposed to unexport the extension and unrecognized fields. We'd need to figure out some other strategy for implementing those features, or else commit to keeping the implementation-detail leakage in the API.

awalterschulze · 2017-01-18T16:55:10Z

I know at least that the double embedding tricked has caused me a lot of work when it was done for extensions. So I would opt for leaving this as is from a purely selfish point of view.
Changing this publicly available fieldname definitely breaks backwards compatibility. At least for me, I will definitely have to change quite a few code generation plugins :)

But thats probably not a good enough reason not to do this.

bcmills · 2017-01-18T17:07:04Z

Changing this publicly available fieldname definitely breaks backwards compatibility.

That's true, but note that these fields are already excluded from the compatibility guidelines:

[W]e reserve the right to make breaking changes without notice for the following reasons:
[…]

Adding, removing, or changing methods or fields in generated structs that start with XXX. These parts of the generated code are exported out of necessity, but should not be considered part of the public API.

So pushing third-party protobuf tools to use only the stable part of the API is probably a good thing anyway.

awalterschulze · 2017-01-18T17:11:55Z

Yes fair enough. I'll do the work :)

Only using the compatible parts is not possible for all cases. To do better marshaling and unmarshaling testing I would like to be able to generate messages with populated XXX_recognized fields. So my plugin to generate randomly populated messages, needs to be able access that field.
The Equal method generator also needs access to this field for more obvious reasons.
The GoString method plugin also needs to print out this field.
The generated marshaler and unmarshaler also needs to access this field.
etc.

bcmills · 2017-01-18T17:30:08Z

To do better marshaling and unmarshaling testing I would like to be able to generate messages with populated XXX_recognized fields. So my plugin to generate randomly populated messages, needs to be able access that field.

Couldn't you populate the exported fields, marshal the message, and inject pre-serialized data for unrecognized fields, and then unmarshal the message? It's an extra marshal/unmarshal round-trip, but that doesn't seem terrible for a test utility and allows you to work with only the exported API (and the documented proto encoding spec).

The Equal method generator also needs access to this field for more obvious reasons.
The GoString method plugin also needs to print out this field.
The generated marshaler and unmarshaler also needs to access this field.

All of those are implementation details, though: they're features that are normally internal to functions in the proto package.

Also note that with the double-embedding approach, code within the generated packages would still have access to the (unexported) second-embedding struct and its fields. So if we're careful in how we define the first-embedding API, it may still be possible for gogoprotobuf generated methods to access that data.

I would like to see the proto package move toward a stronger split between the public API and generated-code hooks. (That will hopefully be more feasible once golang/go#18130 is addressed.) With that approach, I could imagine moving the XXX fields and their types out to exported types in a generated and/or internal package instead of making them unexported types within proto. The generated or internal package could export functions to efficiently access the fields hidden by double-embedding. That would allow for (unstable) hooks for tools like gogoprotobuf to explicitly access the internal API but without polluting the documentation for the (stable) public API.

awalterschulze · 2017-01-18T17:48:43Z

The current way allows me to inject those values on several levels in deeply nested messages.

The need for the proto package is a sore point for me and one that I would like to remove in future, at least as an option for gogoprotobuf. Even with a limited usecase I still think removing this dependency is of great value in simplifying some installations.

It would be great if the resulting xxx field can still be touched be generated code without going into the proto package.

I would really love to one day have an extension to not import the proto package. This obviously won't work for extensions and any but hopefully for all other usecases.

bcmills · 2017-01-18T18:02:06Z

The need for the proto package is a sore point for me and one that I would like to remove in future, at least as an option for gogoprotobuf. Even with a limited usecase I still think removing this dependency is of great value in simplifying some installations.

That should be straightforward for proto3, since it uses Any instead of extensions and explicitly discards unknown fields on deserialization. You'd need to elide the Register call (which would break ptypes.DynamicAny), but otherwise it ought to work fine.

It may be possible for proto2 messages that don't reserve extension fields, but we'd need to figure out what to do about unknown fields: proto2 requires that messages preserve unknown fields, but at the moment we don't have an exported, stable API for accessing or manipulating them and I'm not sure that we should.

awalterschulze · 2017-01-18T19:13:12Z

Exactly if I just stop Register messages only extensions and any stop working :)

But if this change goes through in such a way that a proto package is needed, then my plan only works for proto3 which is quite a shame.

So If something has to change, them I am for the publically available XXX_ becoming a private xxx_

As I understand XXX_InternalUnrecognized would require the use a proto package?

rsc · 2017-01-18T19:17:03Z

Why do you want to not have a proto package? It seems like there's an underlying root cause there (like maybe package management) that we should be addressing instead.

awalterschulze · 2017-01-18T19:45:15Z

dependencies, given even the best package management, still need to be managed.
Less is more :)

But there is also confusion when using gogoprotobuf and golang/protobuf.
Ideally you would only want to use one and this can typically be done, but when you start having a dependency like grpc, then you are suddenly important golang/protobuf by default. Now if for some reason I decided not to generate code, then I need to make sure that I call the right proto package. I haven't had this problem, but I have run into people who have. And I feel their frustration. This gets even more confusing given two separate jsonpb packages and dealing with grpc-gateway.

Also vendoring a protobuf project has its limits.

wangxx2026 · 2018-07-01T08:42:24Z

latest code occurs, again

sh0umik · 2018-07-04T14:36:03Z

XXX_ fields are back again ... :D

dsnet · 2018-08-03T18:48:42Z

Renaming as "unexported XXX_ fields" since the current title suggests that "XXX_" fields will one day disappear; which is not true. The generator will need to inject some fields necessary for proto functionality. The goal is to have them be unexported and not user-visible either through documentation or by direct field access.

achandak123 · 2019-04-11T01:46:20Z

Any update on this issue 👍

james-lawrence · 2019-06-17T01:30:48Z

this really needs to be resolved. completely makes testing protobuf structs a nightmare.

dsnet · 2019-06-17T02:09:45Z

this really needs to be resolved. completely makes testing protobuf structs a nightmare.

Please explain?

james-lawrence · 2019-06-24T22:50:07Z

fields that get populated with data during serialization, are nested, and no clean way to reset them to zero. means that one can't compare them with a structure made in a test without jumping through hoops.

dsnet · 2019-06-24T22:59:07Z

fields that get populated with data during serialization, are nested, and no clean way to reset them to zero. means that one can't compare them with a structure made in a test without jumping through hoops.

That has nothing to do this this issue though. This issue is about unexporting the XXX_ fields, which doesn't change the fact that the internal fields will still be populated; and doing something like reflect.DeepEqual on protobuf messages won't work.

You want something like #762

james-lawrence · 2019-06-24T23:24:56Z

it use to work just fine. ;) but yes something like #762 would work.

We modify protoc-gen-go to stop generating exported XXX fields. The unsafe implementation is unaffected by this change since unsafe can access fields regardless of visibility. However, for the purego implementation, we need to respect Go visibility rules as enforced by the reflect package. We work around this by generating a exporter function that given a reference to the message and the field to export, returns a reference to the unexported field value. This exporter function is protected by a constant such that it is not linked into the final binary in non-purego build environment. Updates #276 Change-Id: Idf5c1f158973fa1c61187ff41440acb21c5dac94 Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/185141 Reviewed-by: Damien Neil <[email protected]>

dsnet · 2020-03-03T05:22:49Z

The v1.20.0 release of protoc-gen-go no longer generates XXX fields and methods. Internal fields continue to exist, but are now unexported.

bcmills added the enhancement label Jan 11, 2017

bcmills mentioned this issue Jan 11, 2017

proposal: spec: use zero receiver for value method embedded via nil pointer golang/go#18617

Open

awalterschulze mentioned this issue Jan 18, 2017

protoc-gen-go: fully generate marshaling and unmarshaling code #280

Open

bcmills mentioned this issue Sep 27, 2017

proposal: spec: remove embedded struct golang/go#22013

Closed

dsnet changed the title ~~Remove XXX_ fields from generated types.~~ protoc-gen-go: Remove XXX_ fields from generated types. Feb 14, 2018

dsnet changed the title ~~protoc-gen-go: Remove XXX_ fields from generated types.~~ protoc-gen-go: remove XXX_ fields from generated types Feb 14, 2018

dsnet mentioned this issue Mar 12, 2018

protoc-gen-go: support go_name option to specify Go identifier #555

Closed

dsnet added the breaking-change requires making a breaking API change label Mar 14, 2018

dsnet mentioned this issue May 9, 2018

protoc-gen-go: remove/inject field tags for XXX fields #597

Closed

sitano mentioned this issue May 15, 2018

protoc-gen-go: XXX_ fields break code back compatibility #607

Closed

dsnet mentioned this issue May 22, 2018

updated, now generated code won't compile and fields marked with xxx_ ? #615

Closed

dsnet mentioned this issue Jul 30, 2018

XXX_unrecognized broke code #594

Closed

dsnet added the api-v2 label Aug 2, 2018

dsnet changed the title ~~protoc-gen-go: remove XXX_ fields from generated types~~ protoc-gen-go: unexport XXX_ fields from generated types Aug 3, 2018

This was referenced Sep 15, 2018

protoc-gen-go: simplify generated oneof implementation #708

Closed

bigquery: infer schema from protoc generated struct googleapis/google-cloud-go#1119

Open

yurishkuro mentioned this issue Dec 23, 2018

Issues with upgrading gogo/protobuf (lots of XXX_ fields/methods) jaegertracing/jaeger#1262

Open

jeanbza mentioned this issue Mar 21, 2019

Created protoDocument from proto.Message preserves generated XXX_* fields. googleapis/google-cloud-go#1367

Closed

dsnet mentioned this issue Jun 1, 2019

XXX_Marshal returns wrong encoded byte slice #863

Closed

dsnet added pending-v2 and removed breaking-change requires making a breaking API change enhancement labels Jul 8, 2019

dsnet added this to the v2 release milestone Aug 21, 2019

dsnet closed this as completed Mar 3, 2020

bcmills mentioned this issue Mar 12, 2020

proposal: cmd/doc: add "// Unstable:" prefix convention golang/go#34409

Open

golang locked as resolved and limited conversation to collaborators Jun 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

protoc-gen-go: unexport XXX_ fields from generated types #276

protoc-gen-go: unexport XXX_ fields from generated types #276

bcmills commented Jan 11, 2017 •

edited

Loading

bcmills commented Jan 11, 2017

bcmills commented Jan 11, 2017

rsc commented Jan 17, 2017

rsc commented Jan 17, 2017

bcmills commented Jan 17, 2017 •

edited

Loading

bcmills commented Jan 17, 2017

awalterschulze commented Jan 18, 2017

rsc commented Jan 18, 2017

awalterschulze commented Jan 18, 2017

bcmills commented Jan 18, 2017

awalterschulze commented Jan 18, 2017

bcmills commented Jan 18, 2017

awalterschulze commented Jan 18, 2017

bcmills commented Jan 18, 2017 •

edited

Loading

awalterschulze commented Jan 18, 2017 •

edited

Loading

bcmills commented Jan 18, 2017

awalterschulze commented Jan 18, 2017

rsc commented Jan 18, 2017

awalterschulze commented Jan 18, 2017

wangxx2026 commented Jul 1, 2018

sh0umik commented Jul 4, 2018

dsnet commented Aug 3, 2018

achandak123 commented Apr 11, 2019

james-lawrence commented Jun 17, 2019

dsnet commented Jun 17, 2019

james-lawrence commented Jun 24, 2019

dsnet commented Jun 24, 2019 •

edited

Loading

james-lawrence commented Jun 24, 2019

dsnet commented Mar 3, 2020

protoc-gen-go: unexport XXX_ fields from generated types #276

protoc-gen-go: unexport XXX_ fields from generated types #276

Comments

bcmills commented Jan 11, 2017 • edited Loading

bcmills commented Jan 11, 2017

bcmills commented Jan 11, 2017

rsc commented Jan 17, 2017

rsc commented Jan 17, 2017

bcmills commented Jan 17, 2017 • edited Loading

bcmills commented Jan 17, 2017

awalterschulze commented Jan 18, 2017

rsc commented Jan 18, 2017

awalterschulze commented Jan 18, 2017

bcmills commented Jan 18, 2017

awalterschulze commented Jan 18, 2017

bcmills commented Jan 18, 2017

awalterschulze commented Jan 18, 2017

bcmills commented Jan 18, 2017 • edited Loading

awalterschulze commented Jan 18, 2017 • edited Loading

bcmills commented Jan 18, 2017

awalterschulze commented Jan 18, 2017

rsc commented Jan 18, 2017

awalterschulze commented Jan 18, 2017

wangxx2026 commented Jul 1, 2018

sh0umik commented Jul 4, 2018

dsnet commented Aug 3, 2018

achandak123 commented Apr 11, 2019

james-lawrence commented Jun 17, 2019

dsnet commented Jun 17, 2019

james-lawrence commented Jun 24, 2019

dsnet commented Jun 24, 2019 • edited Loading

james-lawrence commented Jun 24, 2019

dsnet commented Mar 3, 2020

bcmills commented Jan 11, 2017 •

edited

Loading

bcmills commented Jan 17, 2017 •

edited

Loading

bcmills commented Jan 18, 2017 •

edited

Loading

awalterschulze commented Jan 18, 2017 •

edited

Loading

dsnet commented Jun 24, 2019 •

edited

Loading