-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stan code canonicalizer #390
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good - was this your first ocaml?? We just needs some tests. You could make a new canonicalizer test folder and add a dune
file that is really similar to any of the ones in good
with stanzas for pretty printing (but with the argument changed from --pretty-print
to whatever.
src/frontend/Canonicalize.ml
Outdated
let deprecated_distributions = String.Table.create () | ||
|
||
let () = | ||
Hashtbl.add_multi deprecated_functions ~key:"multiply_log" ~data:"lmultiply" ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional nit: we don't actually need Hashtbl with its mutability; we could just create an immutable Map here. I know we use this Hashtbl pattern elsewhere for the immutable stan math signatures - not sure if that's a pattern we want to keep using or not even for really large amounts of data.
Map would look something like:
let deprecated_distributions = String.Map.of_alist_multi
[("multiply_log", "lmultiply")
; ("binomial_coefficient_log", "lchoose")
...
]
@@ List.concat_map Middle.Stan_math_signatures.distributions
~f:(fun (fnkinds, name, _) ->
 List.map fnkinds ~f:(function
 | Lpdf -> (name ^ "_log", name ^ "_lpdf")
...
src/frontend/Canonicalize.ml
Outdated
let rename_distribution name = | ||
match Hashtbl.find deprecated_distributions name with | ||
| None | Some [] -> name | ||
| Some (rename :: _) -> rename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only use the first element of the value, why are we using a multi map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's my first Ocaml and I have no idea what I'm doing :P. The obvious alternative Hashtbl.add
had a weird return type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, this is great for a first run! Would you mind changing it? You can use add_exn
instead.
| GetLP -> GetTarget | ||
| FunApp (f, {name; id_loc}, e) -> | ||
if is_distribution name then | ||
CondDistApp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the structure of the tree (replacing a FunApp with a CondDisApp). Is that because sometimes we have incorrectly labeled distributions as FunApps prior to this stage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deprecated density functions with _log
suffix are real functions and are called as such. They're replaced by _lpdf
suffix functions and those require the vertical bar syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah. Stan3 says this code is incorrect
target += std_normal_lpdf(x);
and expects instead
target += std_normal_lpdf(x|);
but Stan2 can't parse that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, looks like the former should be okay. Let me make an issue for this.
src/stanc/stanc.ml
Outdated
@@ -83,6 +84,9 @@ let options = | |||
; ( "--auto-format" | |||
, Arg.Set pretty_print_program | |||
, " Pretty prints the program to the console" ) | |||
; ( "--canonize" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hahaha. What about --print-canonical
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, sounds good.
Thanks for the review. A question I'd like input on is whether to use typed or untyped AST. For now it's untyped because that's what the pretty printer expects by default but should be easy switch. Either way, there's a problem:
|
An interesting conundrum! I think a good answer to this in the future might be that we split up type inference from type checking. That would make post-inference a really logical place for this phase. @enetsee has talked about this before but is pretty busy atm. At the moment, could it make sense to have two phases in canonicalize, one pre and one post checking? |
Hmm. I don't think that's my fault. All tests pass on my computer. Anyway, I think this is basically done. A few more thoughts: User defined functions with The last remaining deprecated feature is
Strangely, Stan3 does not emit a warning when compiling Another transform I considered is combining declare and define.
could be changed to
Makes the code a bit more compact. Now that declarations are allowed anywhere it's easy to simply defer the declaration until the first assignment, if it is within the same basic block. This is safe assuming the code does not use uninitialized variables. (Moving stuff between basic blocks would require complex analysis that's not available in AST.) |
Hm, let me take a look at that failure - agreed it's probably not related to the code here. |
What do you need the global symbol table for exactly? It seems off the cuff like it could be better to make an isolated pass that works on an AST than to couple this to the semantic check pass, even if that saves some duplicated work.
Seems like it should! Would you mind filing an issue? I don't have a strong opinion that declare-and-define is always superior, but seems fine to me. If there's a revolt we can always remove it :) But right now the code can rely on uninitialized variables being initialized to NaN (which is a feature I hate and would love to change). We might be able to change that already, I think there's only a couple of MPI tests in CmdStan that rely on that. We do want to move stuff between basic blocks eventually :) Definitely for optimization and possibly also for canonicalization, not sure. |
Yeah, I don't know what I was thinking. The only problem with renaming user defined functions is that if there's a function or variable with the same name anywhere in the model it won't compile. But having something called I haven't looked at the optimizer; I assume it handles many complex code transformations already. But it does all its work with the MIR so the results cannot be printed as Stan code. There's an open issue for missing deprecation warnings. #165 I think I withdraw the declare-and-define thing. Cleaning up deprecated code is enough for one PR. |
Thanks! Will get to this today or Wednesday I hope - uncharacteristically busy the past week and the next week or so. |
String.drop_suffix name 4 ^ "_lpdf" | ||
else String.drop_suffix name 4 ^ "_lpmf" | ||
in | ||
String.Table.add deprecated_userdefined ~key:name ~data:newname |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually want to ignore duplicates or do we want to do something with them? I'll merge for now but just food for thought going forwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicates come from forward declarations (which is needed if the function is recursive). The signatures given in function declaration and definition must be identical and no other function may have the same name.
This looks great, congrats on your first OCaml code and PR to the repo :) I'll add you to the contributors list. |
Based on issue #262.
A new command line option that removes deprecated syntax from Stan code and pretty prints the result. No attempt is made to infer priors.
I still need to figure out what tests are required. The pretty printer seems to have nothing but a
pretty.expected
file in every subdirectory oftest/integration/good
. How are those generated?Copyright holder: Niko Huurre
By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses: