Changes from stanc2

In the process of implementing stanc3, several bugs were found in stanc2 and were fixed in stanc3. We record these here. (A general theme is that error messages have changed a lot and should have improved.)

Target is a reserved variable name

Given its central role in the language, target should be a reserved keyword and should not be free to use as a variable.

Printing without a semi-colon is not allowed

We should require that the user write a semi-colon after a print-statement, just like after any other statement.

Function arguments should not be allowed to shadow

We have intentionally disallowed shadowing (masking) of variable names from an outer scope by ones in an inner scope almost everywhere in the language, so as not to confuse novice programmers. One place where we have not done so is for function arguments. I think it should be enforced there as well.

ODE solver type checking was wrong

There is an inconsistency between the data only restrictions for the arguments to the ODE integrators between what is described in the manual and what is actually implemented in Stan. EDIT (wds15): The issue talks about the 3rd and 4th argument being required to be data. These are the initial time and the time-vector. These are allowed to be var as of Stan 2.19 (I think). It looks like stanc3 is doing the right thing and allowing a var for these arguments as it should be.

Non-returning function calls should be followed by a semi-colon

Currently, non-returning function calls without semi-colon are allowed as statements. This seems like a bug in the parser.

Parse error in if-else

Conditional control flow constructs have strange parsing behaviour with block structure.

Stanc gives run-time errors when using function names (e.g. add) corresponding to operators (e.g. +), for certain signatures

This could either be a Stan Math error or a Stanc error.

Variable declarations and statements can be mixed see also this issue

Declare and define does not work at the top of a block without curly braces. When declare after assignment the msg is not informative.

Array constructors work with sampling statements

Failure of a correct Stan model to compile.

parameters {
  real<lower = 0> sigma;
}
model {
  sigma ~ inv_gamma(1e-4, 1e-4);
  { -1.0, 0.0, 1.0 } ~ normal(0, sigma);
}

Added pretty printing functionality, exposing a pre-processed version of the Stan program from the compiler

The stan::lang::compile function should have a std::string& argument that gets instantiated to the fully expanded .stan file after preprocessing.

Fixed parser warnings for semicolons in function arguments

The parser is producing a confusing warning for the following ill-formed Stan program:

transformed data {
  real x = atan2(2 ; 3);
}

Parser points to the name of non-existent functions rather than the end of the line

If a user tries to use a function that does not exist or otherwise specifies it wrong, the parser says something like error in ... at line X, column Y where Y is the end of the line. This confuses RStudio's error flagging (which goes off Y) and makes it look like the problem is with whatever is the last argument to the function, rather than the function itself.

Correct error message when same identifier used for both fct name and var name

Parser flags use of _lxxf suffix on built-in functions in sampling statements

When the same identifier is used for both a fct name and var name, it misidentifies this as use of a reserved word.

We disallow transpose operator on scalars

Parser allows transpose of primitives in program - it would be better to flag this and fail.

Throw informative error on non-ascii characters and other lexing errors

We should alert users when they use non-ASCII characters outside of comments. This can be handled in the preprocessor and flagged as such.

Better error message for program block names which contain typos

When a Stan program has a model block name which is wrong, e.g. parameter instead of parameters, the parser error message is misleading:

PARSER EXPECTED: whitespace to end of file.
FOUND AT line <n>:

Improved line numbers in if/then errors

When exceptions are thrown in the logic of if/then statements the line number points to the first if statement and not the actual line where the exception was encountered.

Fixed Windows/Unit line ending problems in error message locations

I'm using a fresh copy of dev (both Stan and CmdStan). The new parser error line numbers look great, but they're consistently off the model I'm trying to build.

Parser promotes int to double when self-multiplying or self-dividing a vector

Parser should promote ints to doubles when doing *= and /= with a vector (or matrix) on the left-hand side.

Better error message on non-existent higher-order function

If a higher-order function does not exist, parser error message points to its functor argument instead of the higher-order function itself.

Better error message when function already defined

When a function is defined twice, the error message points to the function AFTER the second definition. A program defines the function "rot_matrix" twice in the "functions" block. The error message points at the beginning of the definition of a later function, "scale_matrix".

Errors use Stan syntax (by leveraging pretty printer to render AST properly)

Can have errors in "a * b" that get translated as "multiply(a, b)" in the error messages, even though the user never wrote "multiply". This is an issue with any kind of intermediate form normalization (* being replaced with multiply here)---we need a path back to the actual user input.

Keywords are reserved in stanc3

Currently, various keywords can be used as variable names in Stan. This seems like a bad idea from a PL perspective. For instance, lower and upper can be used as variable names.

Operator precedence has been fixed to be consistent with the functions reference

There is an inconsistency between the operator precedence as implemented in stanc2 and as described in the manual. The manual describes that .* binds more tightly than *. stanc2 seems to bind * more tightly.

Rstanarm has been patched (in test models) to not exploit shadowing/keyword overloading/operator precedence/semi-colon bugs in stanc2 anymore

By adding underscores to some variable names and adding extra parentheses to make precedence as intended and adding semi-colons where they should be.

Truncation type checking has been fixed

The type checking for truncations in stanc2 does not always work. Some models where the truncation bounds should be integers, get accepted even if the bounds are real.

Recursive includes are allowed (with checking for cycles)

Currently, includes of more than 1 level deep are ignored. This seems impractical. Currently, you can include one Stan file a.stan from another b.stan. But we cannot have a.stan includes b.stan includes c.stan. The include of c.stan in b.stan will then be silently ignored by the current parser. At the very least, an error should be thrown if that is the desired behavior.

Uh oh!

Changes from stanc2

Stanc gives run-time errors when using function names (e.g. add) corresponding to operators (e.g. +), for certain signatures

Variable declarations and statements can be mixed see also this issue

Rstanarm has been patched (in test models) to not exploit shadowing/keyword overloading/operator precedence/semi-colon bugs in stanc2 anymore

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally