Skip to content

Changes from stanc2

Rok Češnovar edited this page Jan 27, 2021 · 40 revisions

In the process of implementing stanc3, several bugs were found in stanc2 and were fixed in stanc3. We record these here. (A general theme is that error messages have changed a lot and should have improved.)

Given its central role in the language, target should be a reserved keyword and should not be free to use as a variable.

We should require that the user write a semi-colon after a print-statement, just like after any other statement.

We have intentionally disallowed shadowing (masking) of variable names from an outer scope by ones in an inner scope almost everywhere in the language, so as not to confuse novice programmers. One place where we have not done so is for function arguments. I think it should be enforced there as well.

There is an inconsistency between the data only restrictions for the arguments to the ODE integrators between what is described in the manual and what is actually implemented in Stan. EDIT (wds15): The issue talks about the 3rd and 4th argument being required to be data. These are the initial time and the time-vector. These are allowed to be var as of Stan 2.19 (I think). It looks like stanc3 is doing the right thing and allowing a var for these arguments as it should be.

Currently, non-returning function calls without semi-colon are allowed as statements. This seems like a bug in the parser.

Conditional control flow constructs have strange parsing behaviour with block structure.

Stanc gives run-time errors when using function names (e.g. add) corresponding to operators (e.g. +), for certain signatures

This could either be a Stan Math error or a Stanc error.

Declare and define does not work at the top of a block without curly braces. When declare after assignment the msg is not informative.

Failure of a correct Stan model to compile.

parameters {
  real<lower = 0> sigma;
}
model {
  sigma ~ inv_gamma(1e-4, 1e-4);
  { -1.0, 0.0, 1.0 } ~ normal(0, sigma);
}

The stan::lang::compile function should have a std::string& argument that gets instantiated to the fully expanded .stan file after preprocessing.

The parser is producing a confusing warning for the following ill-formed Stan program:

transformed data {
  real x = atan2(2 ; 3);
}

If a user tries to use a function that does not exist or otherwise specifies it wrong, the parser says something like error in ... at line X, column Y where Y is the end of the line. This confuses RStudio's error flagging (which goes off Y) and makes it look like the problem is with whatever is the last argument to the function, rather than the function itself.

When the same identifier is used for both a fct name and var name, it misidentifies this as use of a reserved word.

Parser allows transpose of primitives in program - it would be better to flag this and fail.

We should alert users when they use non-ASCII characters outside of comments. This can be handled in the preprocessor and flagged as such.

When a Stan program has a model block name which is wrong, e.g. parameter instead of parameters, the parser error message is misleading:

PARSER EXPECTED: whitespace to end of file.
FOUND AT line <n>:

When exceptions are thrown in the logic of if/then statements the line number points to the first if statement and not the actual line where the exception was encountered.

I'm using a fresh copy of dev (both Stan and CmdStan). The new parser error line numbers look great, but they're consistently off the model I'm trying to build.

Parser should promote ints to doubles when doing *= and /= with a vector (or matrix) on the left-hand side.

If a higher-order function does not exist, parser error message points to its functor argument instead of the higher-order function itself.

When a function is defined twice, the error message points to the function AFTER the second definition. A program defines the function "rot_matrix" twice in the "functions" block. The error message points at the beginning of the definition of a later function, "scale_matrix".

Can have errors in "a * b" that get translated as "multiply(a, b)" in the error messages, even though the user never wrote "multiply". This is an issue with any kind of intermediate form normalization (* being replaced with multiply here)---we need a path back to the actual user input.

Currently, various keywords can be used as variable names in Stan. This seems like a bad idea from a PL perspective. For instance, lower and upper can be used as variable names.

There is an inconsistency between the operator precedence as implemented in stanc2 and as described in the manual. The manual describes that .* binds more tightly than *. stanc2 seems to bind * more tightly.

Rstanarm has been patched (in test models) to not exploit shadowing/keyword overloading/operator precedence/semi-colon bugs in stanc2 anymore

By adding underscores to some variable names and adding extra parentheses to make precedence as intended and adding semi-colons where they should be.

The type checking for truncations in stanc2 does not always work. Some models where the truncation bounds should be integers, get accepted even if the bounds are real.

Currently, includes of more than 1 level deep are ignored. This seems impractical. Currently, you can include one Stan file a.stan from another b.stan. But we cannot have a.stan includes b.stan includes c.stan. The include of c.stan in b.stan will then be silently ignored by the current parser. At the very least, an error should be thrown if that is the desired behavior.

Some inputs to the ODE-solvers pass the Stan type check and fail on C++ compilation. Data only argument status is not propagated correctly through function calls.

Users need to write y ~ normal(0, 1) now instead of y ~ normal_lpdf(0, 1).

Previously you could write a user-defined function that called something like an ODE solver with a function argument that happened to be data without annotating that the function argument needed to be data. Now our typechecker prevents that.