Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider making strong mode read check less eager #31391

Open
leafpetersen opened this issue Nov 16, 2017 · 11 comments
Open

Consider making strong mode read check less eager #31391

leafpetersen opened this issue Nov 16, 2017 · 11 comments
Labels
area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). language-strong-mode-future-work

Comments

@leafpetersen
Copy link
Member

There is an implied read check on the return value of getters or methods on generic classes when the return type contains a contra-variant occurrence of an unsound generic type parameter. The check takes the form of a cast to the type implied by the static type of the target of the read. That is, if x is C<int>, and C<int>.p has some type F, then x.p is rewritten tox.p as F . This is currently done even if the use of x.p is at a different type. Consider this example:

class A {}
class B extends A {}
typedef Callback<T> = bool Function(T);

class Model<P> {
  Model(this.callback);
  Callback<P> callback;
}

void test1(Model<A> a) {
  print((a.callback as Object) == null);
}

void test2(Model<A> a) {
  Callback<B> callback = a.callback;  // Callback<A> <: Callback<B>, so safe
}

void main() {
  var a = new Model<A>((a) => true);
  var b = new Model<B>((b) => true);
  test1(a);
  test1(b);
  test2(a);
  test2(b);
}

In test1, the callback is immediately cast to Object which should always succeed, but the implied read check causes a cast failure on the second invocation before the cast to Object can happen.

In test2, the callback is immediately cast to Callback<B>, which would succeed for both calls, but again, the implied read check cast causes a cast failure on the second invocation.

It seems reasonable to me to fix this by saying that the implied read check takes the form of a cast to T, where T is the surrounding context type if there is one, and otherwise is the type implied by the static type of the target (as is currently done).

@stereotype441 Would this be just a small tweak to the implementation, or would it require more work?

@lrhn @floitschG @eernstg @munificent @anders-sandholm @vsmenon WDYT?

@leafpetersen leafpetersen added area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). language-strong-mode-polish labels Nov 16, 2017
@stereotype441
Copy link
Member

Yes, I think this would be a fairly easy tweak to implement.

Note (from our in-person discussion): your proposed fix would only address test2. To address test1, we would need a further rule saying that the implied read check is never inserted when the surrounding context is an explicit "as" check. That should also be fairly easy to implement.

@lrhn
Copy link
Member

lrhn commented Nov 17, 2017

It's a change in strategy. Currently we are saying that the expression e.callback is unsafe, it's not guaranteed to satisfy its static type at runtime. When we add the check, we preserve the expression type safety.
What you do here is to ignore that and only check the type that is actually required by the context. That is, we allow an expression to have a runtime value that doesn't satisfy its static type.

The rule, as I read it, is to treat unsafe return values (contravariant ones) as just unsafe, like it's being Object (since we don't know what it actually is), and you are just doing a dynamic down-cast to the expected context type, independently of the static type (similar to what we do for arguments to convariant functions)
The expression of an as cast does not impose a type on the expression, so that down-cast (like in test1) is trivial. But then, so is comparison to null (_ == null does not impose a type constraint tighter than Object), so you shouldn't even need the cast.

I'm not sure I can find a way to abuse this (but I'll try!)
It only happens in an obtuse corner case anyway, and there will be a type check to ensure that it's still allowed in the context.
So the next question is: Why only here.
Example:

List<Object> x = <int>[1];
void Function(double) f = (num x) { print(x); }
f(x[0]);  /// static type :(void Function(double))(Object)

This is unsafe. We make it "safe" by adding a downcast as double on the argument expression, but (as Erik would point out), that's doing worse than necessary. What if we instead did as (actual argument type of function), then the call would succeed.

It feels like what you are doing here is similar to that, you are avoiding an (otherwise reasonable) runtime check for a statically unsafe operation, and allowing it to succeed anyway if the runtime types work out. It's different from my example in that you only use static type information, so maybe it's better.
I still think it's a little icky.

We can still get into problems like:

Model<A> m = new Model<B>();
var c = m.callback;  // c gets static type void Function(A), actual type is void Function(B).

Here we throw because the static type become the inferred type of the variable, and is not satisfied, because m.callback isn't type-safe. It's a little sad that we can infer a type and then fail the test, but m.callback is unsafe.

@eernstg
Copy link
Member

eernstg commented Nov 17, 2017

If T is a type and the static type of m is Model<T> (OK, then it should probably have been named ford rather than m ;-) then m.callback should be typed as Callback<Null> (because that is the best possible safe approximation using current types), or \exists S <: T. Callback<S> (because that's the best possible sound type, and it will give us better error messages relating to T).

However, we currently type it as Callback<T> and then evaluate it as m.callback as Callback<T> (that is, we add a caller-side check because it is known that this is a contravariant expression). As I mentioned elsewhere we can give an arbitrary expression e an arbitrary type U by compiling it as e as U, so there nothing stopping us from claiming that m.callback has that type, and it might even succeed at run time.

The proposal here would be to note that we have a sequence of several dynamic checks back-to-back in some situations, and only a proper subset of them (typically just the last one) are required for heap soundness (basically because we can accept that a compiler-generated intermediate variable has a different type than the "naive" statically known type of the expression whose value is stored in it), so we can simply omit all other dynamic checks from that back-to-back sequence.

The extra issue that comes to mind here is that the overly strict caller-side check for Callback<A> when a.callback is evaluated in the body of test1 is not just superfluous, it is directly raising a run-time error which is not justified by soundness requirements, and that means it's really helpful to omit it --- it is no longer just an optimization.

But it seems like a rather ad-hoc improvement of the situation that we avoid these unjustified run-time errors only in the situations where we happen to have a back-to-back sequence of dynamic checks.

So, as mentioned, I'd prefer to type contravariant expressions soundly in the first place, and also to generate dynamic checks that are justified by soundness requirements, and not just insert an overly strict dynamic check because we ignore a statically known kind of variance and we don't want to look up the real requirement for that dynamic check.

@munificent
Copy link
Member

munificent commented Nov 27, 2017

It seems reasonable to me to fix this by saying that the implied read check takes the form of a cast to T, where T is the surrounding context type if there is one, and otherwise is the type implied by the static type of the target (as is currently done).

It's not unreasonable but it feels like spooky magic to me. A couple of thoughts:

  • The looseness of the type system has let the user's program get into a weird state at runtime — there is a variable of type Model<num> whose value is an object of type Model<int>, even though Model<int> isn't actually substitutable for a Model<num> in practice. I don't think there's a lot of value in adding more language features to try to let it keep on truckin' even further off the path. They are already in the weeds. Dart has historically been designed around the idea that it should not keep on truckin' after an error has occurred.

  • I think inference is really important (obviously), but it's also important to not goo too far towards implicit magic. I don't expect users to have a crystal clear grasp of the downwards inference context of their code, so using that to elide a cast feels like it's using one hidden property and then using it to also hide what is likely a bug in their code. It feels like the language should ask the user to be more explicit about their intention here. If they are deliberately casting to a wider type in that context, why did they stuff the value in a statically narrower type in the first place?

  • I worry this may box us in when it comes to future type system features. If we later add variance control, will this feature interfere with that? Will it cause problems if we add smarter type inference and more things get more complex downwards contexts?

If there are some very compelling use cases where clearly idiomatic code is getting stuck by this, I can see it being worth doing. But otherwise, my intuition is that we should spend our cleverness budget elsewhere.

@eernstg
Copy link
Member

eernstg commented Nov 29, 2017

Currently we are saying that the expression e.callback is unsafe ..

But that's a misconcept in the first place: We should assign a sound type to each expression, rather than giving it some type which may or may not be justified by the semantics of the language, and then throwing if it happens to not have that type.

Otherwise we could just as well decide arbitrarily that all expressions have type String, and throw as soon as we have evaluated an expression whose value isn't a String. Voila, it's sound!

@vsmenon
Copy link
Member

vsmenon commented Nov 29, 2017

Another option is to make T in a contravariant position a static error unless we can add a callee-side defensive check (that effectively changes T to top from a soundness perspective).

class Model<P> {
  Model(this.callback);
  Callback<P> callback; // static error
  void method(P p) { ... } // ok due to callee-side type check
}

When we added the callee-side check in DDC, we did hit runtime errors - and they're rather confusing to users.

@leafpetersen
Copy link
Member Author

Another internal user just ran into this. Abstractly speaking, their code looks like:

class A<T> {
  void foo(T a) {...}
  T x;
}

class B<T> {
  void apply(A<T> a) {
     a.foo(a.x);
  }
}

void main {
  new B<dynamic>().apply(new A<int>());
}

@lrhn
Copy link
Member

lrhn commented Feb 21, 2018

Our covariant generics are unsound. The big question is when we should err because of that unsoundness.

We are currently going with "expression safety" where the runtime type is a valid subtype of the static type, and the static type considers type arguments as exact types.
It means that the static type of Model<num>.callback is bool Function(num), even if that's not guaranteed to be true.
The fix mentioned here is to treat Model<num>.callback as "unknown", which basically means Object and a following implicit down-cast to the context type. That's still safe, but it doesn't have expression safety - we are considering the expression as "not statically safe", so we know there isn't static safety, so we introduce runtime checks as necessary - not to ensure expression safety but to ensure runtime safety, Effectively, we give up on static safety entirely, and just assume that any Object can be the result of the unsafe expression.

Erik is proposing to consider a static type of Model<num> as meaning Model<? extends num>. It's not an exact type, but a set of possible types.
Assigning that to Object is always fine, all types in the set are assignable to Object. Assigning it to Model<int> requires a check (as Model<int>).
Now, invoking Model<? extends num>.callback returns something of type Callback<? extends num> aka bool Function(? extends num), which is a type with a lower bound of bool Function(Num), but no upper bound (the opposite of Model<? extends num>, which has an upper bound, but no lower bound).
So, the only Function type it can safely be assigned to is bool Function(Null), and the runtime type of the function can be any unary function (since (Object x) => true is assignable to bool Function(num) anyway).

Our type system doesn't handle that kind of reasoning now, but it could (although I doubt we can do that on short notice).

I'm open to both options, but don't see the latter as practically possible. It complicates the type system more than we can afford at this point, without getting much more than a few omitted down-casts of known unsafe expressions.

@leafpetersen
Copy link
Member Author

Sorry, definitely not something I want to take up now - I just was noting that this had come up again to remind myself for posterity.

Briefly to your comments @lrhn: I don't actually see this as a really fundamental change. If you think of our covariant generics as something like o : Model<num> means o : Exists<T extends num> in Model<T>, then what we do now on a read of o.callback is open the existential and try casting to Callback<num> (because all we know is that we have Callback<T> for some T. This change would just be a change in what we try casting to: instead of casting to Callback<num> and then to the context type, we just cast to the context type (if we have it). So I guess I don't see it as a change in the model: we know we have something that may not be of the type we expect, it's just a question of what type we try casting to.

I'm also actually quite interested in thinking about a broader approach around internalizing some notion of wildcard types so that we could expose to the user the fact that these types are vaguer than expected. We have other use cases for something like this, so if we could come up with a unified story around this, it could be quite nice.

Future work though.

@eernstg
Copy link
Member

eernstg commented Feb 21, 2018

Erik is proposing to consider a static type of Model as meaning Model<? extends num>

Indeed, or \exists T <: num. Model<T>, which may be useful when the quantifier does not sit at the top level. The point is that we should have a correct static analysis of the possible run-time values of any expression, such that the value of every typable expression will be guaranteed to have a type which is a subtype of the static type (and hence we will never need any 'read checks').

I understand that it may not be possible to start using such an approach overnight, but (1) I believe that "contravariant" and "wildly variant" expressions are pretty rare, and (2) if we make this switch from using read checks to using static types that are known to hold then it basically means that some expressions which are silently unsafe today (because they may fail during a read check) will be transformed into regular downcasts. They will then be silently unsafe by default during data flow, but --no-implicit-casts may be used to keep track of them (and the downcasts can now be replaced by is checks, if preferred). For member lookup the more general type may break some code (so e.foo may have to be modified, maybe to (e as C).foo, because e is no longer statically a C).

So even though there may be some breakage, I think it is likely to be small, and I think the required changes will be helpful for code comprehensibility.

@eernstg
Copy link
Member

eernstg commented Feb 22, 2018

Sorry, I forgot to reload before responding, so I didn't see your response, Leaf.

I don't actually see this as a really fundamental change.

The reason why I think it matters is that we are comparing an approach

  • (1) where an expression e gets a static type T such that e may not have that type, and then we just replace e by (e as T) to make it throw when e does in fact not have the type T

with another approach

  • (2) where e gets a type S which it is actually known to have, and then the evaluation succeeds. If the developer insists on using the value of e as a T, something like (e as T) must be written explicitly, or the downcast could arise implicitly, e.g., in some context like foo(e).

Whenever S is what the developer needs, the run-time failure in approach (1) is gratuitous, and whenever T is actually required, approach (2) ensures that the downcast is treated just like all other downcasts (e.g., if implicit it can be flagged with --no-implicit-casts, it can be replaced by an is check, etc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). language-strong-mode-future-work
Projects
None yet
Development

No branches or pull requests

6 participants