-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A more diligent name resolution? #10403
Comments
There are non-constant global variables. Storage locations for such variables are resolved early, but their values (including type) can change at run time. Given a Symbol or SymbolNode in an AST, you need to look at the function's list of local and closed variables. If the symbol does not appear there, static parameters are examined next. These are stored in the Using that procedure, I believe all variable references can be resolved. In case I haven't understood, it would help to provide an example of an "unresolved" name. |
I'm not saying that names cannot be resolved following the procedure you described, I'm suggesting that all names (of Symbol type) be resolved to either SymbolNode or GetfieldNode when Julia first produce typed ASTs, so that no Symbol remains in the them (except in Expr head positions). Since type inference already can find out exact location (which module) of a name, it can save future effort of repeating this process, especially when dealing with situations where in a AST rewriting function, we don't know which module a piece of AST comes from. It is also quite troublesome to safely copy ASTs from one place to another. The current implementation is not very consistent about resolving names. For example:
We can see a few inconsistencies above:
Another example would be:
If we dump the typed AST of both, we can see that f contains an unresolved copy! call, while the AST of g contains a fully resolved copy! as GetfieldNode. So it appears that somehow name is only resolved to avoid conflict? Why not always resolve them? PS: I don't quite understand the decision of allowing globals to change type, though this is a different topic. In an interactive shell, newer definitions of a non-function variable should shadow previous definitions, which I believe is a different case from running code defined in a fully enclosed module. If a user really want a variable's type to be anything, he/she should explicitly say it is of Any type. |
The reason we do this is efficiency. The number of symbols over all ASTs is very large. If a function contains, say, 100 function calls, we would have to create 100 extra nodes that simply repeat the information that all of these names are found in the same module as the enclosing function. Instead we can just refer to an AST, and a single pointer to a module. It's much more efficient to remember what module the AST came from than to repeat the information in every global symbol reference. |
Also there is a function |
If GetfieldNode is immutable, then we can use techniques like hashconsing to reduce memory overhead. A similar situation is with type annotation, is it really necessary to introduce SymbolNode when Symbol's type can be looked up in the environment? There are always trade-offs, and more convincing arguments can be made by measuring real overheads. |
SymbolNodes are used because a variable's type can be different at different points. We keep lots of compiler data structures in memory during execution, and it's kind of a problem already. Better than hash-consing is to eliminate unnecessary objects entirely. Why is it so hard to keep track of what module an AST came from? If you want I can write a small wrapper around |
Like I said, sometimes it is very difficult to figure out where a piece of code comes from. Here is a short example using macros:
If I understand it correctly, the purpose of using esc(x) is to allow the names in the AST (value of x) resolve in the call site of applying the macro. It is apparently not working (causing an error), and sin didn't resolve to the intended A.sin function. I admit the example was abusing the macro facility a bit. But situations like this do come up often in our use cases because names in an AST are not always accompanied by their locations. |
Once you have the |
Actually after thinking about it more, I'd argue that resolved names should stay in AST, and resolved types should stay out of AST. Once you know the location of a variable (local or module), it is easy to look up its type. But the reverse is not as easy, because you have to walk the module hierarchy and perform a lookup in every module you visit. Also, after inference, type castings become explicit, so it is wasteful to annotate every variable occurrences with a type. |
I'm pretty sure this is not true. What do you mean by "type castings"? Probably some SymbolNodes are redundant, but certainly not all. A julia local variable does not always have a fixed type; different occurrences can have different types. In fact, we already avoid creating SymbolNodes in many cases, for example for constant globals, since their types can easily be looked up just as you say. I don't believe you ever need to walk the module hierarchy yourself. A module and a symbol are sufficient, e.g. to call |
Sorry, I wasn't very clear. I was talking about the cost of doing name resolution, not what Julia function one has to call to resolve names. Like I said, if a local variable can be assigned different types within the same body, it should either be declared as an union type or Any type. Its use can be accompanied by an explicit casting (for now, you can think of a SymbolNode where the type is different than the declared type of this symbol). If you are concerned about space usage, adding one GetfieldNode per definition (in a module) and just returning the same object in every successful lookup sounds to me like a reasonable solution. I doubt it will blow up space usage compared to the current approach. If you are willing to give it a try, we can find someone who has free cycles to do this kind of experiment, and back it up with real data. |
Removing the SymbolNodes for variables that are always the same type is a good idea. I'll plan to do that. I still don't see why the GetfieldNodes are needed. Given an AST's module, putting a GetfieldNode with that module on every symbol inside it seems equally silly to me as putting a SymbolNode with the same type on every use of a variable. |
At least in the language implementations that I'm familiar with, both MLton and GHC do this in their ASTs. Local definitions have short names, but otherwise are always qualified with where it comes from. In addition to making things more consistent (name resolution is done once and for all), I think it is also a nice trade off between space (actually quite minimum if same name objects are reused) and efficiency (having to walk module hierarchy for a lookup). Overall it reduces complexity of working with ASTs. I understand that Julia strives to bring JIT performance, and perhaps it has not required sophisticated transformations over typed ASTs. |
Is the problem that you want the information at macro-expansion time? |
Yes, we are abusing macros to quickly generate functions like the example above, and want to be safe at handling AST pieces going into it. |
Whether we should have all names resolved at macro expansion time is kind of debatable. But I do think it is a bit more hygiene, especially when used with esc. |
I think you should try to find a different approach than using As an example, we might have
where |
Points taken. Name resolution should be after lowering. Actually the perfect place is in type inference, because it has to resolve all names anyway to figure out their definitions and types. In referring to the specific example of using esc in macros that I gave above, a solution would be to keep track of the location (call site of a macro application) in the :escape Expr, so that at the type inference stage, it knows where to look up the escaping names. |
IIUC, in your example |
It indeed depends on what should be the intended semantics of using esc with a macro argument expression out of its original context. Here is my attempt at defining this behavior. When used in a left hand side like you have pointed out, maybe it should not resolve at all and become a local variable, maybe it should just resolve and then become an error. When used in a non-LHS position, it resolve as if it were in the original context and again two situations would follow: (1) if the name resolves to a local definition in the original context, it can either become an error or cause the local definition to become an environment variable (as if captured by a closure); (2) if the name resolves to module level definition, it becomes a GetfieldNode. |
Actually thinking about it again, in the LHS case, it should always resolve to either a local or global definition in the original context, the same as in the non-LHS case. It is just that when it resolves to a local, the original definition should now become something that can be captured by closures. Now, to implement such semantics may not be very straightforward, and it is pretty much a corner case anyway. |
I modified the example to not use
If I remove the esc, it becomes more interesting:
Apparently it didn't resolve sin to A.sin. Is it a bug? I believe macro expansions are allowed to contain local function definitions, but somehow the name resolution is messed up when splicing ASTs into their body. UPDATE: my bad, this is not a bug (the one above it is), because without esc, names should resolve with respect to the macro's environment. |
|
My bad. So it is really only a problem when used out of context, like Anyway, I hope Julia developers can consider the option of having name resolved once and for all. |
@ninegua Please be careful when talking about macros on github; writing |
I think having names fully resolved at macro-expand time is inherently incompatible with the kind of macros we have. |
@quinnj I didn't know about this, thanks for the tip. @JeffBezanson I wasn't saying at macro expansion time, I meant it to be during type inference, and all typed ASTs will then have either fully resolved names, or locally defined names. |
Ok, how about the utility function I offered (wrapping |
Though I still don't really see the problem. I agree with your argument about type annotations, but I think the same argument applies to GetfieldNodes. Somebody else might find it very convenient to have the types marked everywhere, to save a lookup step. Given a symbol, you will have to do a lookup to find its type. By doing that, you also discover whether it is a local variable. If it's not a local variable, you know what module to look in. I just don't think this procedure is difficult or expensive. |
To be clear, when you see |
However what we could do is rename |
If there's going to be some effort put into this, is there any potential synergy in addressing #1334 simultaneously? At one point there was the suggestion that we should just include complete function/file info for each line number, rather than only for the first line. |
I think it's separate from line numbers, but very much related to #9023 (variable renaming). |
… params part of #10403 ast.args[2][1] has been removed since all of its information was also in the varinfo lists ast.args[2][2]. the varinfo lists are now at ast.args[2][1]. args[2][2] is captured var info, args[2][3] is gensym info, and now args[2][4] is a list of static parameter names. GlobalRef is now built in so it can be used more extensively.
Thanks, Jeff! I'm trying out the latest master, and I'm glad to see the changes in typed ASTs. However, I notice the following expression is still not named resolved:
Would appreciate if .* can be resolved to Base module. Actually a related question that I have is, how do you write something like methods(Base..*) with correct syntax? EDIT: After a closer look, actually the above .* is pretty printed as ., but the real AST is Main... Why is it not Base..*? |
I decided to print operators as just their names, since otherwise the printed ASTs are hard to read. I admit it's awkward, but you can write I'd like to resolve it to Base..*, but currently in some cases this forces bindings to be resolved too early. I'll keep working on it. |
It looks like resolution for operators is different than normal names. Here is a program that demonstrate this issue. I attempt to re-define
As seen above, the definition of An additional question is, why isn't |
This is because modules do The AST contains |
Thanks for the explanation. I'd appreciate if there is a way to turn off On the second issue, I believe here is what you meant:
At printing, the AST of So, am I correct to assume that once a module is completely loaded, evaluating all |
Paul, Check out baremodules: -Jacob On Tue, Jul 14, 2015 at 1:18 PM, Paul Liu [email protected] wrote:
|
@quinnj Thanks for the suggestion, Here is what I tried:
So I've managed to define a |
I'm trying
Both points can be illustrated by the following program:
We can see that What I had hoped for was to resolve both to module
For non-functions such as the use of The fact that Julia's name resolution is only "final" until compile time is also a bit surprising. I don't know if there is a solution to make everybody happy, but the following demonstrates what is going on:
I find it intriguing that there is a warning for module B, but not for module C. |
Actually here is what happens when I use
If I understand it correctly, the intended use of The same goes to |
Just want to add a comment that I really do think the name resolution should be final at type inference time (instead of compile time in codgen), otherwise we risk the danger of resolving it to a different type. The follow shows this bug:
One may argue that the above is not a valid program because function
So instead of merely doing type inference, I forced compilation of I might be splitting the hair here, but the above illustrates a name resolution gap between inference and codegen. The thing caused the crash is because type inference is making an assumption that the call to I understand that forcing inference and compilation can sometimes result in surprising behaviors, but at the same time I hope I've just made the case that name resolution should be final at type inference time, so that GlobalRefs would never be resolved to different things when being compiled later. By "final", I mean it resolves to the exact place where the matching method is defined. This is where inferencer finds the type, and where codegen finds the method. |
Before I raise the problem, please help clarifying two of my understandings:
If both the above are true, then the following shall be their logical consequences:
I put "statically" in quotes because Julia is certainly dynamic, but on the other hand, in the current implementation, once a Julia function is compiled, it stays compiled, and a compiled function does not attempt to resolve names at runtime.
With the above understanding, may I request the following feature to be implemented?
This will save a lot of trouble of implementing optimizations leveraging type information available in typed AST, because typed AST will no longer contain unresolved names, and all resolved names will only be of 3 forms: TopNode (for system definitions), GetfieldNode (for module-level definitions), SymbolNode (for local definitions). In all cases, it would be straightforward to obtain the type of a name.
The text was updated successfully, but these errors were encountered: