Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protoc-gen-go: support go_name option to specify Go identifier #555

Closed
dsnet opened this issue Mar 9, 2018 · 27 comments
Closed

protoc-gen-go: support go_name option to specify Go identifier #555

dsnet opened this issue Mar 9, 2018 · 27 comments
Labels
generator-proto-option involves generators respecting a proto option to control generated source output proposal proposal-hold
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Mar 9, 2018

Background

Identifiers in proto are not directly translatable into Go for several reasons:

  1. By convention, naming styles are different:
    • Protos uses CamelCase for messages, groups, enums, services, and methods
    • Protos uses snake_case for fields and oneof fields
    • Protos uses SCREAMING_SNAKE_CASE for enum values
    • Go uses CamelCase for everything
  2. As a language rule, Go uses an uppercase letter to indicate visibility. Protobuf has no such visibility rules and everything is assumed to be publicly visible to everything else.
  3. Proto allows you to declare an enum or message within a message. Thus, a Child message defined in the scope of a Parent message is named Parent.Child. However, Go has no language equivalent for this (see proposal: spec: support nested types or name spaces go#20467).

Status quo

When protoc-gen-go generates Go identifiers, it goes through some lexicographical transformation to convert proto naming conventions to Go naming conventions (see generator.CamelCase).

This lexicographical re-write works well most of the time, but is sub-par in the following cases:

  • It cannot handle abbreviations well: enable_gps => EnableGps
  • It can result in name conflicts:
    • foo and Foo are distinct identifiers in proto, but both are named Foo in Go to work around visibility rules.
    • Parent_Child and Parent.Child are distinct identifiers proto, but both are named Parent_Child in Go to work around Go's lack of nested declarations (see my concern protoc-gen-go: remove type name from generated enum #513).

Proposal

I propose that we add support for a go_name option. This is an option that all Go generators should be aware of and respect, with adoption starting with protoc-gen-go.

Example:

syntax = "proto3";

package example.goname;

message FooMessage {
    string enable_gps = 1 [go_name = "EnableGPS"]
}

The option can be applied to following: MessageOptions, FieldOptions, OneOfOptions, EnumOptions, and EnumValueOptions.

  • The go_name option for MessageOptions and EnumOptions must be supported by all generators since a proto file may specify a message or enum from another proto file by name. The generator must be able to reliably reference the target type.
    • Note that this bakes in the assumption that generated code will always have a top-level Go type for each message and enum proto type.
  • It is optional whether a generator respects the option for the other types, although protoc-gen-go will.
    • EnumValueOptions controls the enum variable name
    • FieldOptions and OneOfOptions controls the struct field name
      • For a hypothetical generator that emit messages with no fields, but only getters and setters, it up to the implementation what they want to do. Using SetX and GetX is a reasonable approach.

Optionally supported for: ServiceOptions and MethodOptions

  • protoc-gen-go does not directly support services. However, it is reasonable that grpc also respect these options.

They do not apply to: FileOptions and ExtensionRangeOptions.

  • Do not apply because there is no sensible meaning for go_name here. go_package is the file-level equivalent that controls naming and import path information for the Go package.

Secondly, I propose that we formalize the rules for how Go names are generated when the go_name option is lacking. Essentially, document the functionality of generator.CamelCase. This is one step in the direction of having a sensible ecosystem of various generators and/or plugins (#547).

Use-cases

  • Provide an opt-out when making breaking changes to the generator. For example, suppose we dropped the enum type prefix to fix protoc-gen-go: remove type name from generated enum #513, we would need to provide a way for users to continue using the old name.
  • For situations where automatic CamelCasing is sub-par, provides the user the ability to provide an idiomatic Go name (e.g., EnableGPS)
    • This supersede the customname, enum_customname and enumvalue_customname options in gogo/protobuf.
  • Satisfy the desire to generate unexported message and type names (see proposal: protoc-gen-go: support generating unexported types #501)
    • Intentionally choosing an unexported name means you are certain that nobody else imports your proto file.
  • Provides a workaround for rare situations of name conflicts

Design specifics

What is valid syntax for go_name?
The string must be a valid Go identifier per the language specification. If it is invalid, then the generator reports an error. It is the user's responsibility to ensure that Go identifiers do not conflict. A generator implementation may choose to report an error at generation time if such a conflict exists.

How does this work with nested messages and enums?
Suppose you have a proto file with message Parent { message Child { option go_name = "Foo"; } }. Is the type name of the inner message going to be Foo or Parent_Foo. I argue that the name is "Foo" as that is strictly more expressive than simply prefixing the parent name.

What happens when field names are unexported?
The current implementation of the proto package will crash under this situation. However, I argue that this behavior is just an implementation detail of the current package.

There are ways to workaround this. For example, you could generate a Reference method that returns a pointer to any field of a given unexported name.

func (m *FooMessage) Reference(name string) interface{} {
    switch name {
    case "field1": return &m.field1
    case "field2": return &m.field2
    case "field3": return &m.field3
    default:       panic(fmt.Sprintf("%T has no field named: %s", *m, name))
    }
}

The protoc package could type assert for this method and use it.

However, I would prefer not polluting the public API of generated messages any further. Thus, for the initial support for go_name in protoc-gen-go, we can intentionally report an error until some future time we decide to fully support this. However, other generators like protoc-gen-gogo may support this. I'm comfortable with the fact that it is at least possible.

I have several ideas for #276 that will be useful here.

What does it mean for field extensions?
TBD.

What does it mean for enum value defaults?
TBD.

What does it mean for fields belonging to a oneof?
TBD.

\cc @neild @rogpeppe @awalterschulze @htuch

@dsnet dsnet added the proposal label Mar 9, 2018
@dsnet dsnet added the generator-proto-option involves generators respecting a proto option to control generated source output label Mar 9, 2018
@neild
Copy link
Contributor

neild commented Mar 10, 2018

+1

Open question: Can we add a go_name field to FieldDescriptor et al., or does this need to be an extension option?

@awalterschulze
Copy link

awalterschulze commented Mar 10, 2018 via email

@rogpeppe
Copy link

@dsnet This looks like a nice potential solution for issue #501 and the other issues you mention, but I'm not entirely clear how it would look. Something like this?

message foo {
	option go_name = foo;
	string message_id = 1 [go_name = MessageID];
	repeated string actions = 2;
}

I think that it should complain if you try to provide a non-capitalized name for a message field because of the reflect issue.

@dsnet
Copy link
Member Author

dsnet commented Mar 12, 2018

For unexported fields, I updated the "Design specifics" section above. My answer is below:


The current implementation of the proto package will crash under this situation. However, I argue that this behavior is just an implementation detail of the current package.

There are ways to workaround this. For example, you could generate a Reference method that returns a pointer to any field of a given unexported name.

// Reference returns a pointer to m.name.
func (m *FooMessage) Reference(name string) interface{} {
    switch name {
    case "field1": return &m.field1
    case "field2": return &m.field2
    case "field3": return &m.field3
    default:       panic(fmt.Sprintf("%T.%s is not a valid field", *m, name))
    }
}

The protoc package could type assert for this method and use it. However, I would prefer not polluting the public API of generated messages any further. Thus, for the initial support for go_name, we can intentionally report an error until some future time we decide to fully support this. I'm comfortable with the fact that it is at least possible, but don't see much evidence that this is necessary.

I have several ideas for #276 that will be useful here.

@awalterschulze
Copy link

Sounds like a good plan :)
Could we make it a warning rather than an error?
Then users with methods are still allowed to have private fields.

Just trying to minimize the diff between gogoprotobuf and golang/protobuf, since go_name already goes such a long way.

@dsnet
Copy link
Member Author

dsnet commented Mar 12, 2018

Could we make it a warning rather than an error?

For which case? When the identifier is invalid? Or for private fields?

If the later, I think it's okay for protoc-gen-go to error out for now, but have protoc-gen-gogo support it. The alternative breeds more inconsistency if protoc-gen-go simply ignores it if the field is unexported. Thus, if protoc-gen-go ever supports it, then it will become a breaking change because some people were relying on the fact that it was a noop.

@awalterschulze
Copy link

I was commenting on private fields.

Its not the most important thing. This feature still brings us miles closer :)

@puellanivis
Copy link
Collaborator

For example, you could generate a Reference method that returns a pointer to any field of a given unexported name.

If I understand this proposed solution correctly, then because the intended change would require breaking the rules of reflection, it is proposed to make a function that just completely circumvents the intent of unexporting the fields in the first place.

i.e. these values are no longer really unexported, they’re just exported through circumlocution: *Struct.Reference("field").(*fieldType) = "this may as well be Struct.Field"

@rogpeppe
Copy link

@puellanivis I'm not saying it's a good idea, but FWIW I suspect that the Reference method could itself be unexported and made available to the protobuf code with a method expression.

@puellanivis
Copy link
Collaborator

There’s no way to protect such a Reference method that will not cause the same issues with the rules of exportation and the rules of reflection.

There is no way to open up access to these unexported fields that will not just present a public API to the world that defeats the entire intent of unexported fields.

If one wishes to break the rules of reflection, then one should just straight-forward break them (yes, it’s possible, no it’s not really straight-forward, this was by design, but there is a direct approach to do it), and take responsibility for having broken them. Reimplementing a new specification of export/unexport semantics is just entirely unnecessary.

If one finds oneself writing a large amount of complex code in order to work around the protections of a language, then either one has accidentally stumbled upon a really bad idea, and should rethink how it should be done; or one has intentionally stepped into a really bad idea, and should rethink how it should be done.

@rogpeppe
Copy link

@puellanivis I agree with almost all of that, except the first two paragraphs.

AFAICS it would be possible to protect something like that. The proto package respects the Marshaler and Unmarshaler interfaces, so a type that wants to use private fields could use an helper (unexported) type with public fields, copying all the private fields to/from a value of that type.

I don't think it would be a good idea though. I think it would be just fine to forbid using a go_name for a field (not a message) that is not capitalised. That fits with other Go encoder requirements (without a capitalised name, the JSON encoding won't work), and it's not generally an issue - an exported field doesn't necessarily pollute the public API of a package.

@dsnet
Copy link
Member Author

dsnet commented Mar 14, 2018

We're getting off track and have gone really deep in the thick weeds of the specific implementation of a feature that I intentionally said that protoc-gen-go would not implement for now.

@awalterschulze
Copy link

Yes, sorry that is my fault. I think go_name is great and having a rule to not support private fields is more than fine.
I was maybe trying to future proof it too much.

@awalterschulze
Copy link

Here is a great example of how gogoproto.customname is being used and will hopefully in future be replaced with go_name
https://github.com/docker/swarmkit/blob/master/cmd/protoc-gen-gogoswarm/customnameid.go

@nichochar
Copy link

Any progress on this? It seems like a really interesting proposal.

@atombender
Copy link

Six months later, what's the status of this?

@dsnet
Copy link
Member Author

dsnet commented Sep 11, 2018

The v2 version of protobufs is in active development. Considering what proto options (if any) that we support is currently a low-priority item. This specific option is also blocked on #547 since there needs to be a stable way to let other "plugins" know that the name of a type changed.

@dsnet dsnet changed the title proposal: protoc-gen-go: support go_name option to specify Go identifier protoc-gen-go: support go_name option to specify Go identifier Mar 4, 2020
@euank
Copy link

euank commented Dec 28, 2020

I want to revive this issue.

Previously, it was blocked on figuring out how golang/protobuf would deal with plugins. Now that #547 is closed with "plugins don't exist in this project, just write your own protoc generator using protogen", it seems appropriate to talk through how this works in practice for this go_name issue.

The protogen library does make it much easier to write a custom protoc generator, which is great!

However, it also doesn't expose enough options to write a custom go_name extension.

Adding the additional options needed to support this was pretty straightforward: protocolbuffers/protobuf-go@master...euank:expose-ident-rewriting . With those changes, it was easy enough to write a goname-extension-supporting generator.

My main question after getting a go_name option working for myself is whether the protobuf-go repo would like the above change.
From my understanding, discussion of that project is supposed to happen in github issues on this project, so this issue seems like the right place to talk through if that's a reasonable thing for me to contribute.

@euank
Copy link

euank commented Aug 9, 2021

Bump on this issue. Would it be appropriate to submit that commit for inclusion in the upstream repo?
Again, from what I read last time, I should get a sign-off on the plan on this repo before submitting such a code change upstream.

So, to make it a concrete question: @dsnet would you like me to submit that change?
Alternatively, am I misunderstanding the contribution process, or is there someone else I should ping / ask?
If I don't get a response here in another 4 months, I'll file a new issue with the above questions.

@dsnet
Copy link
Member Author

dsnet commented Aug 9, 2021

I no longer work at Google and am thus not in a position to make decisions about what happens here.

That said, I suspect this feature is unlikely to ever be provided. Philosophically, this Go protobuf project aims to align itself with the wider protobuf project's goal of being a "language-neutral" mechanism for serializing structured data. The addition of options in the .proto file that tweak the generated API for a given language works against that goal. If go_name were provided, then it opens the door for all other languages to have tweakable options in the .proto file. This would result in the .proto file being incomprehensible in cross-language environments.

Understandably, this frustrates users that primarily operate in a pure-Go environment, but there are competing goals here. The goal of protocol buffers is to be good in all languages, but not great in any particular language (including Go, which is the language that I personally love).

@ydnar
Copy link

ydnar commented Aug 9, 2021

@euank in the meantime, you can use this: https://github.com/alta/protopatch

@dsnet congrats on the new gig. Say hi to Fitz!

@neild
Copy link
Contributor

neild commented Aug 9, 2021

Fundamentally, to support go_name (or the equivalent), we also need to have cc_name, java_name, objc_name, etc. The decision to add cross-language support for field renaming is up to the general protobuf maintainers, not the Go implementation.

I do not believe that it is likely that language-specific field renaming support will ever be added to protobufs. The benefits of slightly more idiomatic names in some languages are outweighed by the loss of predictability and greatly expanded clutter in .proto files. As @dsnet says, the goal of the protobuf project is to provide an implementation and language-neutral mechanism for serializing structured data, and language-specific customizations work against that goal.

I'm going to close this issue, because ultimately the choice here is not up to the maintainers of the Go protobuf implementation. If anyone does want to argue the case for language-specific field renaming, the way to do it is by following the process for adding new features to protocol buffers:
https://developers.google.com/protocol-buffers/docs/faq#can_i_add_new_features_to_protocol_buffers

@neild neild closed this as completed Aug 9, 2021
@puellanivis
Copy link
Collaborator

I think while closing this issue might be met with some sadness of many, it’s still the best choice for now. It needs to be clear that addressing this feature must be brought up with the protobuf team itself, rather than as a language-specific feature request. Leaving the feature request here makes it unclear who should be petitioned about make any sort of progress towards adoption of this feature.

This has been a long asked for feature that has been kicked down the road constantly, but with little commitment to the de facto decision we generally agree exists, “we cannot just make this change ourselves, it has to be directed from the protobuf team.”

I want to add and make clear, by closing this issue here, we’re not saying ”never”, we’re saying this is not the appropriate forum to get it the feature added. We just cannot move forwards on this unilaterally. Please, if you feel conviction for this feature, bring it up with the protobuf team and advocate for it there. They’re the better forum for this.

@dsnet
Copy link
Member Author

dsnet commented Aug 9, 2021

We can probably close #52 under the same rationale. Perhaps also #1225 and #414.

@euank
Copy link

euank commented Aug 9, 2021

I still have one remaining question. I'll happily agree with protobuf-gen-go not supporting a go_name with the reasoning above. That makes sense to me.

However, the change I've got above isn't actually implementing that. It's adding the knobs to https://github.com/protocolbuffers/protobuf-go to allow a custom generator (my-custom-gen-go) to add that feature.

We could say that the protobuf project doesn't support that option officially, but that the protobuf-go library is meant to be flexible enough to allow such a plugin to exist separately.

It probably makes sense to file a new issue for this. The main distinction being drawn here is "is protobuf-go only in service of protobuf-gen-go with no other purpose" or "is protobuf-go a general library for writing custom generator/plugins, and should it have a knob for a user to implement name rewriting if they desire"

@dsnet
Copy link
Member Author

dsnet commented Aug 9, 2021

We could say that the protobuf project doesn't support that option officially, but that the protobuf-go library is meant to be flexible enough to allow such a plugin to exist separately.

The older github.com/golang/protobuf module kinda had an undocumented way for plugins to co-exist. However, compatibility was always poor. It is really difficult to document what compatibility guarantees a plugin can or cannot depend on. As a simple example, a plugin could inject MarshalJSON methods. However, doing so would be incompatible if protoc-gen-go were to ever generate that method. Another example, a plugin could try to rename one of the declarations, which could break the main protoc-gen-go if it were to ever try and reference the original name.

Plugin compatibility is a really hard problem. Thus, the compatibility document says:

Generated additions: We reserve the right to add new declarations to generated Go packages of .proto files. This includes declared constants, variables, functions, types, fields in structs, and methods on types. This may break attempts at injecting additional code on top of what is generated by protoc-gen-go. Such practice is not supported by this project.

Of course this doesn't stop people from trying to mutate the generated code. This project makes no guarantees about not breaking such use cases in the future.

@neild
Copy link
Contributor

neild commented Aug 9, 2021

A custom generator can do anything it wants, so long as it uses only public APIs and creates types that implement the proto.Message interface.

What I think you're really asking for, @euank, is for a way for a custom generator to change field names in generated code in a way guaranteed to be supported by current and/or future versions of the google.golang.org/protobuf/runtime/protoimpl package. That's not something we will ever support: protoimpl is a support package for code generated by the official protoc-gen-go code generator and explicitly makes no other guarantees. The rest of the protobuf module is general purpose, but protoimpl is tightly coupled to protoc-gen-go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
generator-proto-option involves generators respecting a proto option to control generated source output proposal proposal-hold
Projects
None yet
Development

No branches or pull requests

9 participants