-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spec: embedding a type alias is confusing #17746
Comments
It's worse. If the package that exports
After clarification in spec, the compiler needs to be fixed. |
Since this affects type aliases, this issue would persist even if we restricted aliases to types only. |
It should be fairly simple to fix in the compiler: Currently, when we export anonymous fields, we deduce the field name from the type. We simply have to export the actual field name as well (perhaps only when it's different from the type name). |
By fix, do you mean treat |
No, treat |
It looks like the public APIs for reflect.StructField and go/types.Var already independently report name and anonymity/embeddedness, so that's good. However, reflect internally represents anonymous fields as an empty field name, which at runtime it replaces with the field type's name. We'll need to change that to allow named embedded fields. (Unless we do decide instead that |
Using the field name as written seems much clearer. It does mean that some types differing only by alias names are non-identical, such as these three:
|
CL https://golang.org/cl/32633 mentions this issue. |
CL https://golang.org/cl/32674 mentions this issue. |
I believe the behavior that Alan describes in the initial report is the right one. As I said on the CL, think of alias like a #define. Whether you name the alias or the actual definition of the alias, the resulting program should have identical behavior 100% of the time. Because "buffer" is an alias for "bytes.Buffer", embedding "buffer" behaves exactly like "bytes.Buffer". I agree that the embedded buffer being named Buffer is a little bit of a surprise, but the definition of alias is crystal clear about the effect:
I think it would be a serious mistake to make exceptions to that rule. Doing so would bar (or at least complicate) automated fixes that try to update code to use non-deprecated names in place of deprecated aliases. That said, the legacy aliases byte/uint8 and int32/rune do not behave this way: they take the name used in the struct. That implies to me that we should not refer to them as aliases, now that alias has a precise and different meaning. I filed #17766 for a reflect bug involving those. |
I appreciate that the spec change in https://go-review.googlesource.com/30601 breaks the "crystal clear" invariant you quoted, and makes structs nonidentical if their fields differ only by an alias name, but I think using the name as it appears for the field name leads to much more readable code. Also, the change you propose would break an equally important invariant: identifier case determines exportedness. You would be surprised to read this declaration
or perhaps see this documentation in godoc:
and discover that struct type S has a field called Foo (or perhaps even Bar). As for the "automated fixes", the lesson is: when you use aliases as forwarding pointers during a refactoring, be sure to give the alias the same name as the original. |
I gather that in this example the idea is that But what about exports? In this example, does the package have an exported type What is the key difference between those cases? In one case, the aliased name matters, and in the other case the alias name matters? |
The type alias b is not exported because it starts with a lowercase letter. The exported type of B2 is a struct containing a single "anonymous" field named "b" of type bytes.Buffer.
No, you can't compile even the declaration of B3 because an anonymous field must be a named type or a pointer to one, not an unnamed struct type.
The package would have an exported type alias B whose original is named bytes.Buffer and has the declaration struct {...etc...}. |
In the discussion on the alias proposal there was an alternative suggestion (by me, but that that's not important). It differs in that it does not bind any new (local) names. The observation I'd like to point out is that then some, if not all of the problems being discussed above are automatically solved - they cannot happen. |
@cznic You are correct - if we don't allow renames, the problem disappears. |
@rsc I disagree. I believe there's a difference between the behavior of the name (which aliases is all about: different names) and the behavior of the respective entity (type in our case). But I like to step back for a moment and contemplate a completely different approach. Currently we have an inconsistency. There are many ways to resolve this. I am enumerating some here to pique imagination a bit:
I am sure there's more. Arguments in favor of solution 1:
Finally, this would still address the primary use cases we see for aliases. Suggestion for simplified alias declaration syntax:
Examples:
Intuitively, each such declaration simply states that the qualified identifier is declared under the base name in this package. It is automatically re-exported. Or another way of looking at these declaration is to see them as "extern" or "forward" declarations of entities declared elsewhere. Implementation wise this would be a trivial change: Since we're in the freeze we'd want to keep major surgery to a minimum. But all we would need to do in the compiler is adjust the parser, which is a handful lines of code. |
I haven't thought long and hard about it but @griesemer's suggestion (option 1) appeals to me. |
It's certainly simpler, and the syntax is clever, but if we are to add this restriction I wonder if sticking to the existing syntax might not be better, for three reasons. The first is that we can extend it later should the need arise. The second is that this feature of the language will, we hope, be little used and therefore slightly obscure to many users. The existing syntax suggests its meaning clearly, and is less likely to cause surprise or be used by accident. In contrast, the concise syntax proposed here could trigger confusing compiler errors. The third reason is that it looks like just a use of a name, not a declaration, when in fact it's both. |
A restriction for Go 1.8 may make sense, but I think we should reserve flexibility for later. The "simplified syntax" seems too simple - it looks almost like a typo - and it does not have obvious flexibility for later. I am concerned that option 1 is too restrictive. The motivating use case for alias is when moving a declaration from package P1 to P2. It may be that P1.Name was a good name but that P2.Name is not a good name, because P1 and P2 convey different information. The restriction in option 1 limits the refactoring in potentially unhelpful ways. For example, when we moved io.ByteBuffer out of io, it became bytes.Buffer, not bytes.ByteBuffer. It would be sad if aliases effectively disallowed using the best name for the new location. I am also concerned that option 2 is too restrictive. For example the golang.org/x/net/context package's tests define
As of Go 1.8, the definition of Context in that package is
to redirect to the standard library definition. But the tests are worth keeping to make sure the definitions in the golang.org/x/net/context package do work. That test would no longer compile if we adopted option 2. I think 3 and 5 are non-starters, and I understand that others think 4 is a non-starter. Let's strike those. I think 6 would be terribly unfortunate, since we do have a problem to solve. A counter-proposal, intersecting options 1 and 2, so let's call it option 1.5: 1.5) If an alias is used as an anonymous struct field, its name must match its target's name. This avoids all the confusion we've identified without causing the more general problems in the specific examples I mentioned above. |
@alandonovan Parsing the existing syntax and the simplified syntax are almost identical. The lead is a an identifier followed by either "=>" or "." (depending on which approach we choose). Any confusion that can arise with "=>" can arise with "." . |
Isn't option 1.5 "too restrictive" for the same reasons you outlined for option 1? To continue your example, defining bytes.Buffer as an alias for io.BytesBuffer would have disallowed clients from embedding bytes.Buffer. Perhaps what makes option 4 unsettling is that it reveals that "anonymous" fields are not anonymous. All fields have names, but only an "anonymous" field declaration triggers embedding behavior. In fact it would be quite possible to decouple the issues of naming and embedding altogether. |
I certainly do understand the objections to 4. What's nice about 1, 1.5, and 2 is that any of them side-steps having to decide between 3, 4, 5. What's nice about 1.5 in particular is that it breaks fewer uses than either 1 or 2: it breaks only the uses that both 1 and 2 would break. That is, suppose you have these cases: A) Alias io.ByteBuffer => bytes.Buffer, with no embedded uses of the original. (1) breaks A and B. That seems like a win. The problem is caused by the unexpected interactions of renaming and embedding. Instead of disallowing one or the other always, disallow doing both at the same time. |
I'm not sure it's a good idea but I want to mention the possibility of (1.75) any aliased type may be embedded in a struct, but you may only refer to it directly if the name of the alias is the same as the name of the aliasee. That is, in the case of different names, you can embed to inherit methods, but direct references to the field are not permitted, as though there were a name collision. The type reflection information would have the qualified name of the aliasee--a name that can be printed but can not be referenced in Go code. |
@rsc The simplified notation is trivial to extend to allow "=>" should that need arise. Though this might really be a situation where perhaps we should give up flexibility of naming for simplicity and clarity of feature, even when we can't always get the best name at a new location. After all, we still believe we should use aliases sparingly. Regarding the "syntax almost looks like a typo": We will get used to it. Func declarations w/o a function body also look a bit odd, and it's a similar situation. Also, in the (likely) common case where the name doesn't change, we don't have stutter ( Regarding the other options I suggested: I agree that 2), 5) and 6) are not good solutions. In defense of 3) which is Alan's and my original suggestion for this issue: The spec arguably already supports this approach. This:
concerns "referring to", that is use of a name. If I declare an anonymous struct field using an alias
I expect the behavior of that anonymous field (which is referred to by it's alias Finally: Of course we can do suggestion 1.5 for Go 1.8. But it adds one more rule. It also "burns in" the new "=>" token. The suggestion 1) is more minimal and doesn't even require token changes. And I believe suggestion 3) solves the current issue fairly cleanly w/o further changes. |
For whatever it's worth, if I saw
in some code my assumption would be that it behaved exactly like embedding bytes.Buffer except that a user of this package could not refer to the embedded type by name as the alias is unexported. I would also expect
to fail to compile because I embedded buffer not Buffer. But likewise I'd expect:
to accept a bytes.Buffer value since they're type-identical. But that's also confusing because I'd expect an outside package to not be able to create a value of an unexported type. I think there's an inherent contradiction between aliases being another name for something and a name determining whether something is exported, which affects how it behaves and can be used. Maybe we need option 1.1 where you can rename an alias but it has to have export parity; that is, an exported identifier can only be aliased to an exported identifier, and, likewise, an unexported identifier can only be aliased to an unexported identifier. |
I appreciate the simplicity of If you want to rename a type in a LSC from pkg1.T1 to pkg2.T2, ideally the steps are:
Because you want step 3 to be a local operation that only affects pkg2, it's important that any field embeddings of pkg2.T2 be unaffected. I think that strongly suggests the implicit field name needs to be T2, not T1. (The inability to declare type aliases within a package unfortunately complicates renaming from p.T1 to p.T2, but it's possible as a two step operation: p.T1 -> ptmp.T -> p.T2.) |
I generally agree with rsc's "crystal clear" invariant being a good thing. I will be very helpful in writing automatic rewriting tools, which will be, I think, critical to have aliases actually disappear (or at least converge towards disappearing) down the line in the process of a refactoring. I think the main issue of 4 is more the unexpected exportedness than the naming. Much of the confusion of 4 can be avoided by requiring that an alias is required to have the same exportedness, as @alandonovan calls it, as the object it aliases. This also avoids confusion of whether @gri: I think not allowing renames is too strict. Although it is consistent with not allowing aliases to objects within the same package. If you allow renames, you probably should also allow the latter. |
FWIW: Us a consumer of code I would expect embedding with aliases to work as follows.... package t1
import "go/build"
type context => build.Context
type T struct {
context
}
var t T
var _ = t.context // lower-case context package t2
import "./t1"
var t t1.T
var _ = t.context // lower-case context |
I don't like option 1.5 because
succeeds or fails entirely on whether T happens to be an alias that happens to change a name. Even with a really good error message that is, if nothing else, annoying. |
The exact syntax was discussed and refined quite a bit in the proposal process, with significant community input. I strongly believe we should not reconsider that today without very good reason; probably we'd need to go back to the "proposal" phase. Let's please take syntax changes off the table and focus on semantics. The question before us in this issue is to define the semantics of aliases when used as embedded fields. Broadly speaking, there are two kinds of answer to this: either some specific definition or else some restriction that excludes the situation entirely. For Go 1.8, I personally lean toward the latter, because it buys us time to consider the specific definition more carefully later (if ever; maybe the right restriction will be livable, like happened with 3-index slices). From @griesemer and @robpike's replies it sounds like they lean toward the restriction approach too. I don't see any leaning one way or another in the other replies here. Q: Do we agree that for Go 1.8 we should somehow restrict aliases to avoid this situation instead of trying to define its meaning? Assuming yes, then there are four choices proposed so far:
Q: Which restriction should we adopt for Go 1.8? I think there are good reasons to think (1) and (2) are too broad. (1) breaks a rename we've done before, and (2) breaks a use of alias that exists today (but could obviously be changed). I think (1.5) is a reasonable compromise and that we could reserve (1.75) for when a real situation arose that justified it. If there are significant objections to (1.5) and the choice is between (1) and (2), I would opt for (1). Thoughts? |
1.5 seems like a safe compromise that leaves the door open for the future. |
Embedding is fundamental to the language. Peppering it with footnotes seems wrong. Especially since you'd have to know about how a type from another package that you're attempting to embed is declared to know whether you can. Option 1 (sans syntax simplification) isn't a terrible restriction, even if it would disallow some useful things temporarily. |
I think disallowing alias renaming (1) is the safest. I'm concerned that 1.5, 1.75, and 2 introduce opportunities where type p.T becomes a (possibly renamed) aliased type, and causes downstream embeddings to stop compiling through no obvious fault of their own. (Edit: That said, I don't feel strongly in any particular direction.) |
I still believe option 3) is viable and on the table. I have not seen a convincing argument against it. It would keep everything in place. My second choice is 1). |
@griesemer to clarify, option 3 means that given
then Is that a correct summary or did I get muddled somewhere mid-thread? |
@jimmyfrasche That is correct. An anonymous field is referred to by the name used to declare it - not the original name of the type. With respect to behavior (but not naming), the anonymous field behaves as if the original type was used (i.e., it promotes the types' methods, field names, etc.). That is, option 3) leaves everything as is but fixes the current issue, and I believe it fixes it in accordance with the spec (see https://go-review.googlesource.com/#/c/32633/). |
I think of aliases as hard links. That is, it establishes a second name for a type that is interchangeable for the first. This is important because in a refactoring scenario, the "primary" name for the type (eventually) changes. Given that background, I support (3). The name of the type used in the struct definition should set the name of the embedded field. |
Option 3 seems like the least surprising, most consistent interpretation. Embedding is defined in terms of both the embedded type and its name. When a type has multiple names, use the one that was used to do the actual embedding. If there has to be a restriction for Go 1.8, option 1 seems a reasonable compromise that includes a lot of the other planned restrictions for free. The options that disallow embedding in circumstances that can be outside your control are a no-go for me. |
I agree with @mdempsky that all of 1.5/1.75/2 should be considered unsafe options. With any of [1.5,2], if I wanted to use an alias, I'd first need to deprecate using the aliased type embedded. It effectively doubles the refactoring-effort, because I now first need to find and fix all users of my code that use the type embedded(ly?). 1 seems, in that regard, safe; it may break use-cases, but it breaks them at the source, that is, the place where an alias is introduced, instead of used. So, not all of the breakages described by @rsc are equal, IMHO. I agree that it would be unfortunate if aliases could not also be used to rename an exported thing, however, 1 would keep the door open to allow that later. I also find 3 slightly confusing, but to me personally, that confusion does not outweigh the benefits of allowing all known/mentioned use cases. As others have pointed out, it's still logically consistent, it's just a stumbling block for beginners. I'd see it as similar to the nil-pointer in interface thing or that fields to be json-marshalled need to be exported; confusing when you don't know about it, but logically consistent and once it's explained the first time, it's easy to remember and internalize. |
I'm in favor of (1):
One question about (1): Are there alias use cases in which it is critical that the package author have the ability to change from exported to unexported? Or an argument that those don't exist? I personally find the |
I think not allowing renaming alias makes aliases much less useful. Think
rsc's example of io.ByteBuffer and bytes.Buffer. Choosing good package
names to simplify exported names is so essential to idiomatic Go style that
I expect non-trivial amount of aliases be renaming.
Let's not forgetting the non-refactor use of aliases too (integration of
multiple internal pacakages into a API only top-level package, renaming
aliases would play a key role there.
Also, because I expect relaxing this restriction later will likely receive
similar resistance like the introducation of aliases received today
(because it makes alias abuses much worse, which is the main argument
against alias), I think if we make this restriction today, we will never be
able to remove it.
|
I don't understand what the type reflection information looks like for option 3. Are we going to set |
@jimmyfrasche, thank you for the observation that (1) constrains people writing alias statements and (2) constrains people using code that happens to use alias statements. That's an important distinction, and I think it's a good argument that (1) is preferable to both (1.5) and (2). |
@ianlancetaylor I haven't looked at the reflection information yet. I have a pending CL for the necessary export data change (still missing tests besides my manual tests). It does contain an extra field name if the "name" of the anonymous field doesn't match the base type's name. I suspect something similar will need to be done for reflection info. |
Rereading the thread, it looks like we're down to option 1 or option 3. It also seems the arguments for and against each hinge on what a type alias means. Given a type alias in a package The disagreement seems to be whether the name In the #define interpretation it is as if instances of The hard link interpretation is that Option 1 allows both interpretations to behave the same for embedding. Option 3 uses the hard links interpretation. Consider
By the #define interpretation, the field name of the anonymous field is By the hard links interpretation, the field name of the anonymous field is The #define interpretation has the same effect as options 1.5/1.75/2: it constrains people using code that happens to use the alias statement. They have to remember that even though they embedded One of the major use-cases for aliases is refactoring. Renaming aliases is an important part of that, as mentioned. Under the #define interpretation, refactoring is inhibited when embedding is used. Say the type However, under the hard links interpretation nothing has to be changed: the field name is Even if this decision is delayed for now by going with option 1, I think the outcome will still have to be option 3 in the end. |
The restriction that aliases can only reference names in a different package already restricts the renaming operations that can be performed. Under the original proposal you can rename Extending this restriction to cross-package renames (option 1) doesn't seem onerous to me. If it is, then perhaps that's an argument against limiting aliases to names in different packages. |
@neild You can still rename p.T=>p.U, it just requires extra steps and an extra temporary package during the transition. For example:
|
I'm in favour of option 3. The types are indistinguishable in all but name, but naming is precisely what aliasing is all about. It also fits with the existing embedding semantics of built in types such as byte and rune - to define these exactly as aliases would be a nice bonus (issue #17766 is indicative of the confusion in this area). |
I would argue that it's the latter. Option 3 seems best from a user perspective: it doesn't break things nearly as badly when moving an existing type out of a package, because you can drop in a forwarding alias and everything still basically works as expected. |
FWIW (which isn't much :), I'm strongly in favor of an approach which allows byte and rune to be defined as aliases. It's just simpler. |
"An alias can never be lower case" seems to solve the problem and is consistent with the fact that aliasing is about forwarding the exposed object definitions of a package. Renaming feels quite important in order to avoid naming conflicts and as a way to create parametrized packages (aliases as a kind of naming indirection allows to switch type definitions quite more easily). Options 4 still makes the most sense to me (aliases should be almost (*) completely transparent at runtime since it is a compile-time mechanism and aliasing has no incidence on type identity, by definition). Concerning error messages, I guess that the line numbers will help disentangle any discrepancy between alias vs. original name. ( * ) Ideally, there should be local package information when an object is aliased and part of an error message. |
Locking because aliases are no longer proposed. |
When you embed a type alias within a struct, the resulting struct type has a field of the alias's apparent name, but its run-time type descriptor uses the alias's original name. For example:
This is potentially confusing to the reader, and I suspect it will break a number of analysis tools that assume the name in the syntax of the embedded field declaration is the same as the name of the field.
Something to think about.
The text was updated successfully, but these errors were encountered: