[SE-0287] [Sema] Implementation for implicit member chains #31679

Jumhyn · 2020-05-09T18:08:40Z

This is the implementation for a forthcoming proposal (see this discussion thread) that extends implicit member syntax to allow additional member accesses chained off of the base UnresolvedMemberExpr.

Resolves: rdar://problem/57295228

lib/AST/Expr.cpp

test/Constraints/diagnostics.swift

test/Constraints/construction.swift

lib/Sema/CSSimplify.cpp

test/expr/delayed-ident/member_chains.swift

lib/Sema/ConstraintSystem.h

CodaFi · 2020-05-09T22:50:55Z

So, we can look at this problem from the inside out the way you’ve got it, but I wonder if it wouldn’t be cleaner to try it the other way around. The intent is to improve inference around an apply where the base of a sequence of member references is given contextually. So why not have the typing rules for applies cooperate here? We could try to look through an apply to bind the contextual type of whatever parameter to an implicit TypeExpr at the head of the chain. That should handle operator applies for you as well.

Jumhyn · 2020-05-10T00:09:57Z

@CodaFi Just to clarify that I'm understanding correctly, you're suggesting that in a construction like:

foo(.my.member.chain)

The type checking for foo would be responsible for looking through the parameter, seeing if it's the tail of an implicit chain, and binding the base type appropriately?

That sounds like it could work out nicely, but I worry that it misses locations where type inference should be able to determine the contextual type but we forget to look through all the relevant expressions. Please correct me if I'm misunderstanding, but just focusing this on apply expressions would miss the inference in return statements, pattern binding decls, assign expressions, and possibly others. Moreover, whenever a new construct is added to the language, we'd have to make sure to consider the possibility of each sub-expression being part of an implicit member chain in order to bind the base type appropriately.

With the current implementation, the base type of the chain is automatically bound to whatever contextual type we have by the expression at the end of the chain. Am I missing something?

test/expr/delayed-ident/member_chains.swift

CodaFi · 2020-05-11T01:42:39Z

Right, as written the explanation I gave was... incomplete. Let me try to rephrase with the benefit of having puzzled through more of the details of the implementation here.

Essentially, you have the right idea about the typing rules here, the question is how best we propagate contextual information to the heads of the member reference chains. Your implementation does so by modeling the backpropagation explicitly in the AST, whereas I was trying to request we instead specialize the typing rules in CSGen. I believe this would better model the heterogeneous version of this proposal anyways since you are only using the back-pointers to locate from "tail" to "head" as it were.

Now, as I'm sure you've already noticed, this is pretty tough because syntactically the "head" of the chain with the syntactically leading UnresolvedMemberExpr is, in fact, the innermost expression in the AST. But I think as long as we have to eat the cost of an AST traversal to detect the chains we have to rebind, let's just eat that cost in one place.

At the end of the day, all we need to do is bridge the existing connected components here by getting an equality constraint between the tail and the enclosing contextual type.

Jumhyn · 2020-05-11T04:54:38Z

Thanks for your thoughtful feedback @CodaFi, that all makes sense.

I agree with you that lifting this out of the AST is cleaner. There's a couple spots where I worry we lose the information about the "head" of the chain, specifically in the spots where we're calling isUnresolvedMemberChainTail to produce a better fix in CSSimplify. I'll investigate if there's a way to produce the desired fix without walking back down the tree, but it may not be possible without caching the member-chain-base info somewhere--do you think it's acceptable to re-traverse the chain in these cases since we're only hitting these checks in a failure case anyway?

Jumhyn · 2020-05-11T05:07:28Z

Also, the latest commit has the member-chain stuff lifted out of the AST and into the constraint system. I've reworked CSGen (and isMemberChainTail so that we should only make one traversal of each chain over the course of type checking (aside from the issue with diagnostics noted above). Let me know how this looks to you!

Jumhyn · 2020-05-11T14:29:04Z

lib/Sema/CSGen.cpp

@@ -3298,7 +3337,7 @@ namespace {
      CS.addConstraint(ConstraintKind::OptionalObject,
                       CS.getType(expr->getSubExpr()), objectTy,
                       locator);
-      return objectTy;
+      return addUnresolvedMemberChainConstraints(expr, objectTy, locator);


We might be able, here and elsewhere, to lift this into the ConstraintWalker itself in walkToExprPost so that the logic for which expressions to "look through" for chains is contained entirely in getMemberChainSubExpr. That would require us to pay the cost of the getMemberChainSubExpr(expr) == nullptr on every expression, though, as opposed to just the ones that we know might be a part of an unresolved member chain. I think it would also require a new path element kind (or another function getMemberResultLocator that gives us the locator we should use to match the member result to the whole chain result)...

Jumhyn · 2020-05-20T21:26:42Z

@xedin I'd also love your feedback on this PR which implements multi-member chains (without changing the current rules for generic arguments at the head and tail matching).

xedin

Only big question for me here is whether it would be possible to constrain recording of the base type and determining linked chain to the Constraint{Generator, Walker} instead of constraint system. It seems like it should be possible to determine tail and head of the chain by lookahead in ConstraintWalker::walkToExprPre and tie up the chain in ConstraintWalker::walkToExprPost once tail expression is reached again. WDYT?

xedin · 2020-07-29T19:08:16Z

lib/Sema/CSGen.cpp

+      // (represented with a new type variable) must equal the base type.
+      if (CS.isMemberChainTail(expr)) {
+        auto *chainBaseExpr = CS.getMemberChainBase(expr);
+        if (auto *UME = dyn_cast<UnresolvedMemberExpr>(chainBaseExpr)) {


Should this be a castTo<UnresolvedMemberExpr> instead since base of the member chain is always an UnresolvedMemberExpr?

I don't think so—getMemberChainBase works for all member chains, including, say foo?.bar!.baz() (where the base is a DeclRefExpr), so we do actually want to check here whether the member chain has a base which is an UnresolvedMemberExpr, as opposed to anything else that could be sitting at the base.

Perhaps this method would be better named something like maybeAddUnresolvedMemberChainConstraints, that just starts getting pretty long.

I'm asking because getMemberChainBase is only used in combination with isMemberChainTail which hints at it being tailored for unresolved member based chains. It seems like all of this functionality could be localized to ConstraintWalker and chains like this determined upfront...

Jumhyn · 2020-07-29T20:44:10Z

Thanks for circling back on this @xedin! I mentioned a similar possibility up-thread. I think something like that would be possible, with a couple caveats:

I believe we do in fact need to track of UnresolvedMemberBaseTypes on the constraint system itself. It's used in validateInitializerRef (via getUnresolvedMemberBaseType) in CSSimplify to properly form an initializer call from an UnresolvedMemberExpr.
If we lift the call to addUnresolvedMemberChainConstraints into ConstraintWalker, we would be paying the cost of the isMemberChainTail and getMemberChainBase calls on (potentially) every expression we visit, instead of how it is now where we only pay that cost when we visit expressions that might actually be part of a member chain.

Regardless, I think it's perfectly reasonable to move getMemberChainBase and related methods under ConstraintGenerator rather than the constraint system. Those are definitely not needed outside of generation.

If I've missed something let me know! Interested to hear your thoughts on the tradeoff here.

xedin · 2020-07-29T21:01:13Z

I believe we do in fact need to track of UnresolvedMemberBaseTypes on the constraint system itself. It's used in validateInitializerRef

It looks like validateInitializerRef already looks for a type variable representing MemberRefBase in the set of available type variables. I think it's okay to keep it that way. In the future we could arrange type variables into a MapVector based on their locators to mitigate the cost.

If we lift the call to addUnresolvedMemberChainConstraints into ConstraintWalker, we would be paying the cost of the isMemberChainTail and getMemberChainBase calls on (potentially) every expression we visit, instead of how it is now where we only pay that cost when we visit expressions that might actually be part of a member chain.

The problem I see with current approach is that the logic is scattered around constraint generator. I think regular expressions are better off not knowing whether they are a part of the chain or not. Having everything in one spot IMO going to help maintenance in the long run even if constraint walker has to lookahead to determine if expression is included in the chain.

Just to clarify what I'm thinking - if constraint walker encounters an expression which could be a tail of the chain it would spin a separate walker to determine whether there is a chain from this member or any other sub-member, record such chains so all of the intermediate members could avoid the check, this is effectively a depth-first search of chains from a possible tail.

Jumhyn · 2020-07-29T21:29:46Z

The problem I see with current approach is that the logic is scattered around constraint generator. I think regular expressions are better off not knowing whether they are a part of the chain or not.

Makes total sense. It's been a bit since I've touched this implementation but give me a bit and I'll see where your suggestions take me. Thank you!

xedin · 2020-07-29T22:40:30Z

@Jumhyn No worries! Please re-request the review when you are ready otherwise there is a high chance that I'm going to loose track of this again :/

Jumhyn · 2020-07-29T22:57:51Z

@xedin Will do! I realized we're also using getUnresolvedMemberBaseType in CSApply as part of visitUnresolvedMemberExpr. With single-member chains we could use the type of the UnresolvedMemberExpr itself as a stand-in since they were guaranteed to be equal, but that assumption doesn't hold anymore. Do you have a suggestion for how we might be able to deal with that without caching the UnresolvedMemberExpr base types in the constraint system?

(Also, FWIW, I don't think I can re-request your review until your current review request is cleared—looks like just a comment doesn't reset that state)

xedin · 2020-07-29T23:00:51Z

Sorry I forgot to mention that in the review - I think this check is obsolete because diagnostics no longer re-typecheck sub-expressions.

Jumhyn · 2020-07-29T23:03:53Z

@xedin Unfortunately it's used for more than just the diagnostic—I thought that too initially, but we actually use baseTy throughout that method to turn the UnresolvedMemberExpr into a MemberRefExpr with the appropriate (metatype) base. In order to form the base, we need access to the base type.

xedin · 2020-07-29T23:13:18Z

Hm, I think we might be able to extract that from OverloadChoice associated with SelectedOverload picked for that member?

Jumhyn · 2020-07-29T23:16:52Z

Ooh, nice. That looks promising. Thank you for your help!

Jumhyn · 2020-07-30T02:38:56Z

@xedin Unrelated to the above, it looks like some changes have gone in since I last worked on this that makes my model for the member chain constraints not quite work. When we generate the constraints for an expression like,

let _: Foo = .optFoo!

the current implementation here ends up with four type variables:

The base of the chain.
The result of the .optFoo access.
The result of the force unwrap.
The result of the entire chain.

In the constraint system, the type variables actually associated with each expression looks like:

.optFoo!
^  ^   ^
T1 T2  T4

The extra level of indirection was introduced to handle the fact that a) the base of the chain can be an optional of the result type of the final member of the chain and b) the result of the entire chain might be an lvalue even though the base isn't.

However, the fact that T3 isn't actually tied to the result of an expression causes an issue when we get to type coercion. If optFoo is a static var, we get the following solution:

T1 as Foo
T2 as @lvalue Foo?
T3 as @lvalue Foo
T4 as Foo

The fact that T3 doesn't actually get represented in the AST means that coerceToType never gets a chance to insert the proper LoadExpr to perform the lvalue->rvalue conversion. However, I'm having trouble coming up with a way of modeling the constraints that doesn't introduce a level of indirection, which inevitably leaves some type variable out of the AST.

Do you have any thoughts about a path forward?

We previously were not properly handling the diagnostics for using an rvalue implicit member on the left hand side of an assignment. This adds the proper handling and extends it for member chains.

…ains

Since the user can now write additional member accesses off of an UnresolvedMemberExpr, we should offer all available completions rather than just those that match the contextual type.

Instead, an expresison like `.foo()` is represented as an `UnresolvedMemberExpr` nested inside a `CallExpr`.

Jumhyn · 2020-08-27T15:01:50Z

@rintaro (or anyone) mind running tests again? Just rebased to fix failures.

rintaro · 2020-08-27T15:05:50Z

@swift-ci Please test

swift-ci · 2020-08-27T15:08:52Z

Build failed
Swift Test Linux Platform
Git Sha - 321a679e0cd4f3cac6a4e530c6488b1a2af8e62c

swift-ci · 2020-08-27T15:09:17Z

Build failed
Swift Test OS X Platform
Git Sha - 321a679e0cd4f3cac6a4e530c6488b1a2af8e62c

Jumhyn · 2020-08-27T17:03:43Z

@rintaro Once more? 😇

rintaro · 2020-08-27T17:20:36Z

@swift-ci Please test

swift-ci · 2020-08-27T17:20:59Z

Build failed
Swift Test Linux Platform
Git Sha - 0af9cb469158574d5f320a483c3b91a296767206

swift-ci · 2020-08-27T17:24:25Z

Build failed
Swift Test OS X Platform
Git Sha - 0af9cb469158574d5f320a483c3b91a296767206

xedin

Looks good to me, turns out that idea to introduce an implicit result AST node paid off in more than one way!

hamishknight · 2020-08-27T17:44:18Z

@swift-ci please test source compatibility

swift-ci · 2020-08-27T23:16:38Z

Build failed
Swift Test OS X Platform
Git Sha - 8905f97

Jumhyn · 2020-08-28T00:14:06Z

Hmm, the macOS failure seems like a flake to me, could someone with more familiarity with the CI confirm and kick off another run?

xedin · 2020-08-28T00:15:39Z

@swift-ci please test macOS platform

Jumhyn · 2020-08-28T12:12:53Z

There we go 🙂

Huge thanks to @xedin, @rintaro, and everyone else who provided feedback and guidance on this implementation! I started off with basically no knowledge of the type checker so this was a great learning experience.

theblixguy · 2020-08-28T17:12:37Z

@swift-ci please test Windows platform

rintaro · 2020-08-28T17:48:59Z

Thank you Frederick!

xedin · 2020-08-28T21:29:32Z

Thank you, @Jumhyn and happy to help!