Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thesis on improved type system #5168

Closed
ivogabe opened this issue Oct 8, 2015 · 19 comments
Closed

Thesis on improved type system #5168

ivogabe opened this issue Oct 8, 2015 · 19 comments
Labels
Discussion Issues which may not have code impact

Comments

@ivogabe
Copy link
Contributor

ivogabe commented Oct 8, 2015

Hi everyone,

I'm currently starting my bachelor thesis, as part of the Honours Program at the University of Utrecht. The subject will be something with type inference in TypeScript. The final product - besides the written thesis - will be an implementation of the algorithm in TypeScript.

My plan is to fork the TypeScript compiler, and it'd be great if my changes would be merged into TypeScript. My subject is not yet fixed, so I'd like to discuss here which subject would be most suitable.

Here are the two ideas I've got:

1 - Control flow based typeguards.

Example:

let x: string | number = ...;
if (typeof x === number) {
    // x: number
    x = x.toExponentional();
    // x: string
} else {
    // x: string
    x = x.substring(1);
    // x: string
}
// x: string
console.log(x.substring(1));

A different kind of type guards could be implemented like this:

function expectString(x: any) x as string {
    if (typeof x !== 'string') {
        throw new Error('Expected string, got ' + typeof x);
    }
}
let x: string | number = ...;
// x: string | number
expectString(x);
// x: string

This typeguard can be useful when checking the provided arguments in for instance a nodejs server or a public api.

2 - Improved inference using function bodies

Example:

function square(x) {
    return x * x;
}
// Current: square(x: any): number
// Expected: square(x: number): number

function squareLength(x, y) {
    return square(x) + square(y);
}
// Current: squareLength(x: any, y: any): number
// Expected: squareLength(x: number, y: number): number

Dificulties are polymorphic functions (like the identity function (x) => x) and recursive functions. Algorithms that can handle these functions exist (Cartesian Product algorithm or (Extended) Monotone Framework). Such algorithm has been implemented for PHP (without type annotations, only the standard library had hard coded types). This would also be useful for editors which use the language services for JavaScript code, as a lot more types can be inferred. Support for objects will be limited (see below), given the time for my thesis. Limited support for objects is probably good for performance, too. Overall performance will depend on the precision, in some cases we can for example fallback to any to get reasonable performance.

Limited support for objects means that methods, that could be part of an object on the heap, won't be tracked, like the following example:

declare class A { foo(): string; }
declare class B { foo(): number; }
let x;
let y = x.foo();

The algorithm in the PHP implementation would infer y to string | number. Not supporting this means y is typed as any, which would have my preference, as it would fit better in TypeScript. I haven't chosen a specific algorithm yet, I'd like to get some 'requirements' to choose one, for instance, what should happen with recursive functions (a function might call itself, or a function f might call g, that might call f).

I'd like to get some feedback on these ideas. Would you be interested in merging these two, or one of these, into the compiler? Or do you have some related suggesions that I could implement?

@RyanCavanaugh
Copy link
Member

The first idea is similar to something we've been thinking about for a while; see #2388. I think if we had a good implementation of type guards based on control flow, we'd be likely to merge it in. The x as T throwing pattern doesn't seem like something we would support, though perhaps sufficient data (e.g. this pattern is very common) could be presented here.

The second one, probably not. We definitely could have implemented a more sophisticated inference algorithm from the beginning (a la flowtype), but our inference algorithms are deliberately more straightforward. The reasons for this are performance and predictability -- inference from function implementations generally requires whole-program analysis (not cheap), and it's desirable to be able to follow along with the type inference without needing the entire program text to be able to make definitive statements.

@RyanCavanaugh RyanCavanaugh added the Discussion Issues which may not have code impact label Oct 8, 2015
@ivogabe
Copy link
Contributor Author

ivogabe commented Oct 8, 2015

I'd seen that issue too. About the x as T guards, personally I expect that I would be using that pattern a lot. When I write a nodejs server I use a lot of function like this:

function expectString(x: any): string {
    if (typeof x === 'string') return x;
    throw new Error('Invalid type, expected string');
}
// ...
const username = expectString(args.username);
const password = expectString(args.password);

I would prefer writing this using the as guards. I might be the only one who uses this, I will investigate what users are using most and I'll come with a more detailed proposal later.

I understand the concerns with the second idea. It might be possible to do some analysis locally in a partially typed program, but that would still be bad for the predictability. The first suggestion would probably be more suitable indeed.

@mhegazy
Copy link
Contributor

mhegazy commented Oct 8, 2015

One thing to consider is the tooling scenario. for instance JS tooling for instance. you could use some of the ideas in 2 to provide better suggestions in IDE's. this gives you a lower bar for correctness compared to issuing errors in user code, and a lower perf bar compared to a whole program typechecking as well.

I would be interested in seeing what possible function body inference techniques can be used and what is the value on the end user experience.

@ivogabe
Copy link
Contributor Author

ivogabe commented Oct 9, 2015

@mhegazy That sounds like an interesting suggestion. Do you see that as an addition to TypeScript / the JS language services, or more general as a separate project? For completions in the function body the call sites can be used to give completions.

An additional idea, also related to tooling and similar to the second idea, could be to suggest types. A new context menu option could be added that suggests a type for an untyped parameter. This could even be introduced as a 'quick fix' option for noImplicitAny errors in Visual Studio. Small example:

function square(x) { // Error: x has type 'any'
  return x * x;
}

A quick fix would be shown on the 'x' in the function header, saying 'Fix: add type annotation (x: number)'. This type can be determined by using all call sites or the function body. This will probably have no performance issues as only one variable, is inferred, not all (though the algorithm might infer other variables too, to get a good result). What do you think of such feature?

@mhegazy
Copy link
Contributor

mhegazy commented Oct 14, 2015

Sorry for the delay.

Do you see that as an addition to TypeScript / the JS language services, or more general as a separate project? For completions in the function body the call sites can be used to give completions.

I think there is room in the Language Service side to do more of this, either as you mention as quick fixes, or even as a way to provide better JS editing experience.

Here are issues we are actively pursuing to enable better JS tooling using the TS core analysis engine (type system + language service): #4793 and #4790; and here is a branch where this work is going on: #5044.

An additional idea, also related to tooling and similar to the second idea, could be to suggest types. A new context menu option could be added that suggests a type for an untyped parameter. This could even be introduced as a 'quick fix' option for noImplicitAny errors in Visual Studio. Small example:

👍 🎉

We have experimented with the quick fix hookup, but have not had the time to fully implement it. I think this is going to be done in the near to medium term.

@ivogabe
Copy link
Contributor Author

ivogabe commented Oct 17, 2015

I've discussed the ideas with my supervisor. I'll first take a look at the extended type guards and make two implementations: one "quick and dirty" implementation that fits good in the current compiler and one based on an existing theorem ("Monotone Framework"), and compare their performance and accuracy. The latter implementation can optionally be used to implement the quick fix idea or the tooling ideas, I need to decide whether I do that or just focus on the type guards later on.

@jbondc
Copy link
Contributor

jbondc commented Oct 18, 2015

@ivogabe You may want to look at:
http://pag-www.gtisc.gatech.edu/courses/cs6340/dataflow_analysis_part2.pdf

The Bit Vector framework is useful and performs well.

@falsandtru
Copy link
Contributor

+1 for the Control flow based typeguards.

@ivogabe
Copy link
Contributor Author

ivogabe commented Dec 6, 2015

I have experimented with flow based type guards. Basic functionality is working, but the compiler cannot compile itself yet. You can find my code here

I've extended the reachability checks, instead of passing through only the reachability state, I also keep track of a list of previous occurrences of 'flow markers' and type guards. Every flow marker gets a property with the previous flow markers and guards. With that information, the checker can find previous occurrences of an identifier. A flow marker is an identifier (used as an expression) or an iteration statement (to simplify binding of iterations). Narrowing with iteration statements is not fully working yet, as the checks that prevent infinite recursion (x has the type of x, which has the type of x...) are too strong for that. The checker takes the union of the types of the previous occurrences. Types are narrowed using the type guards, that already existed, and assignment.

I'd like to get some feedback on the following issues I had:

1 - User defined type guards

From checker.ts:

function getSymbolOfEntityNameOrPropertyAccessExpression(entityName: EntityName | PropertyAccessExpression): Symbol {
        if (isDeclarationName(entityName)) {
            return getSymbolOfNode(entityName.parent);
        }

        if (entityName.parent.kind === SyntaxKind.ExportAssignment) {
            return resolveEntityName(<Identifier>entityName,
                /*all meanings*/ SymbolFlags.Value | SymbolFlags.Type | SymbolFlags.Namespace | SymbolFlags.Alias);
        }

isDeclarationName is a type guard with return type name is Identifier | StringLiteral | LiteralExpression. Because of that, the cast to Identifier in the next if statement is disallowed since this entityName is narrowed as if it was in the else block of the first if. I get this error a lot in checker.ts. What would you think of 'weak' user defined type guards?

2 - Assignments

This code doesn't compile:

let x: { prop?: number; };
// x: { prop?: number; }
x = {};
// x: {}
x.prop = 42; // Error

I think that the best solution would be to prevent such narrowing. Only narrowing of union types (remove all parts to which the expression is not assignable to). I think subclasses can also be narrowed. Generics cannot be narrowed:

let x: {}[];
x = ['lorem']; // Don't narrow to x: string[]
x.push(42); // Should work

@ivogabe
Copy link
Contributor Author

ivogabe commented Dec 6, 2015

@jbondc The Bit Framework requires that L, the set of all type, is the power set of a finite set D. The power set of a set D is the set with all subsets. That's how I'll represent types too, as the union of all types in such subset. The set of types is not finite in TypeScript, you can for example create these types:
string[], string[][], string[][][], string[][][][], ...

@DanielRosenwasser
Copy link
Member

What would you think of 'weak' user defined type guards?

I think that had user-defined type guards been flow-based from the beginning, we wouldn't have coded that that way. I'd favor that sort of breaking change, but it would definitely be frustrating.

If you're running into a lot of issues like that, and it's more trouble than it's worth, you can treat user-defined type guards as "weak" as a work-around.


Perhaps you should check for strict subtypes before narrowing in your cases. To quote the spec:

The assignment compatibility and subtyping rules differ only in that

  • the Any type is assignable to, but not a subtype of, all types,
  • the primitive type Number is assignable to, but not a subtype of, all enum types, and
  • an object type without a particular property is assignable to an object type in which that property is optional.

Which works out because

  1. You'd never want to "narrow" to any.
  2. You don't want to lose out on optional properties.
  3. It's questionable whether you want to narrow from an enum to number, but for practical situations like var x = Flags.None; x = Flags.A | Flags.B;, it probably is more desirable.

@ivogabe
Copy link
Contributor Author

ivogabe commented Dec 6, 2015

If you're running into a lot of issues like that, and it's more trouble than it's worth, you can treat user-defined type guards as "weak" as a work-around.

It happens 9 times in checker.ts, other files don't seem to have such issues. I could also add a new type guard syntax, like x is? string that would keep the existing behavior of the normal type guard syntax. (Though flow based type guards would be breaking any way, I'm afraid)

Perhaps you should check for strict subtypes before narrowing in your cases. To quote the spec:

As far as I can see that wouldn't solve the array and generics issue. I could solve that by not narrowing generics.

@weswigham
Copy link
Member

Traditionally, we've considered assignments exempt to type guard narrowing (or rather that they disable narrowing within that branch), I believe in order to preserve the assignment restrictions of the original declaration. If we retain this semantic, your second problem doesn't occur, right?

@ivogabe
Copy link
Contributor Author

ivogabe commented Dec 7, 2015

@weswigham Yes, that's right. Though it's relatively easy to add this with flow based type guards, and it will allow code like this:

function foo(x: string | string[]) {
  if (typeof x === 'string') {
    x = [x];
  }
  // x: string[]
}

@ivogabe
Copy link
Contributor Author

ivogabe commented Feb 7, 2016

I've opened a PR #6959. The compiler can compile itself, though one test hangs and the code needs to be cleaned up. Can someone take at look at it? I have two questions in the PR.

@ivogabe
Copy link
Contributor Author

ivogabe commented Nov 24, 2016

Long time without an update, though I have some good news now. It took a bit longer than expected, but it is almost finished now. My supervisor and I have submitted a paper based on the thesis to a conference and they accepted it. We're now working to finish the paper, and we would like some feedback of yours. So is there someone of the TypeScript team that wants to read it? I can send the paper by email. We have to finish the paper on Monday, so it would be great if you could give some feedback this weekend.

Anyone interested?

@DanielRosenwasser
Copy link
Member

@ivogabe I just emailed you so you can send me a copy.

@ivogabe
Copy link
Contributor Author

ivogabe commented Apr 4, 2017

November last year I finished the work on my thesis, and it was published in January. I got a 9.5 (out of 10) for it, and I'm really happy with that 😃. Thanks everyone here for the suggestions, help and feedback!

@ivogabe ivogabe closed this as completed Apr 4, 2017
@RyanCavanaugh
Copy link
Member

@ivogabe Congratulations!

@microsoft microsoft locked and limited conversation to collaborators Jun 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Discussion Issues which may not have code impact
Projects
None yet
Development

No branches or pull requests

7 participants