Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tests for multi-platform end-of-line behavior #1416

Closed
betanalpha opened this issue Apr 1, 2015 · 8 comments
Closed

add tests for multi-platform end-of-line behavior #1416

betanalpha opened this issue Apr 1, 2015 · 8 comments
Assignees
Milestone

Comments

@betanalpha
Copy link
Contributor

I'm using a fresh copy of dev (both Stan and CmdStan). The new parser error line numbers look great, but they're consistently off the model I'm trying to build. For example,

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable "logmu_ooc_C" does not exist.

ERROR at line 124

122:      beta_theta ~ normal(0,2);
123:      
124:      alpha_mu ~ normal(0,2);
                                                           ^
125:      alpha_sigma ~ normal(0,2);

while the model itself is given by

121: beta_theta ~ normal(0,2);
122: 
123: alpha_mu ~ normal(0,2);
124: alpha_sigma ~ normal(0,2);
125: alpha_theta ~ normal(0,2);
126:
127: // Measurements
128: for(n in 1:N_ooc) {
129:   ooc_count_C[n] ~ neg_binomial_2(exp(logmu_ooc_C), exp(log_sigma_ooc_C));
130:   ooc_count_T[n] ~ neg_binomial_2(exp(log_mu_ooc_T), exp(log_sigma_ooc_T));
131: }

The parser should be identifying line 129.

Has anyone seen inconsistencies like this? If someone can confirm then I'll put together a simple reproducible example.

@betanalpha
Copy link
Contributor Author

Possibly comments aren't getting counted in terms of line numbers?

@bob-carpenter
Copy link
Contributor

I haven't and just tried a few examples and it seems fine
for line numbers.

Can you give me the whole program and I can try it here?

Whatever's causing this issue would also cause runtime
numbers to be off, since they're using the same counting.

Are you running this Windows or Linux or Mac? They all have
different line ends, and that may also be messing things
up somehow.

  • Bob

On Apr 2, 2015, at 8:58 AM, Michael Betancourt [email protected] wrote:

I'm using a fresh copy of dev (both Stan and CmdStan). The new parser error line numbers look great, but they're consistently off the model I'm trying to build. For example,

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable "logmu_ooc_C" does not exist.

ERROR at line 124

122: beta_theta ~ normal(0,2);
123:
124: alpha_mu ~ normal(0,2);
^
125: alpha_sigma ~ normal(0,2);

while the model itself is given by

121: beta_theta ~ normal(0,2);
122:
123: alpha_mu ~ normal(0,2);
124: alpha_sigma ~ normal(0,2);
125: alpha_theta ~ normal(0,2);
126:
127: // Measurements
128: for(n in 1:N_ooc) {
129: ooc_count_C[n] ~ neg_binomial_2(exp(logmu_ooc_C), exp(log_sigma_ooc_C));
130: ooc_count_T[n] ~ neg_binomial_2(exp(log_mu_ooc_T), exp(log_sigma_ooc_T));
131: }

The parser should be identifying line 129.

Has anyone seen inconsistencies like this? If someone can confirm then I'll put together a simple reproducible example.


Reply to this email directly or view it on GitHub.

@betanalpha
Copy link
Contributor Author

I’m on a Mac — runtime errors are definitely off, too.
The model/data aren’t public so I’ll try to whip up a
pathological model tomorrow.

On Apr 2, 2015, at 12:53 AM, Bob Carpenter [email protected] wrote:

I haven't and just tried a few examples and it seems fine
for line numbers.

Can you give me the whole program and I can try it here?

Whatever's causing this issue would also cause runtime
numbers to be off, since they're using the same counting.

Are you running this Windows or Linux or Mac? They all have
different line ends, and that may also be messing things
up somehow.

  • Bob

On Apr 2, 2015, at 8:58 AM, Michael Betancourt [email protected] wrote:

I'm using a fresh copy of dev (both Stan and CmdStan). The new parser error line numbers look great, but they're consistently off the model I'm trying to build. For example,

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable "logmu_ooc_C" does not exist.

ERROR at line 124

122: beta_theta ~ normal(0,2);
123:
124: alpha_mu ~ normal(0,2);
^
125: alpha_sigma ~ normal(0,2);

while the model itself is given by

121: beta_theta ~ normal(0,2);
122:
123: alpha_mu ~ normal(0,2);
124: alpha_sigma ~ normal(0,2);
125: alpha_theta ~ normal(0,2);
126:
127: // Measurements
128: for(n in 1:N_ooc) {
129: ooc_count_C[n] ~ neg_binomial_2(exp(logmu_ooc_C), exp(log_sigma_ooc_C));
130: ooc_count_T[n] ~ neg_binomial_2(exp(log_mu_ooc_T), exp(log_sigma_ooc_T));
131: }

The parser should be identifying line 129.

Has anyone seen inconsistencies like this? If someone can confirm then I'll put together a simple reproducible example.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub.

@betanalpha
Copy link
Contributor Author

Turns out it’s Windows newlines — the model was originally
written on a Windows box. Is there any way for the parser
to aggregate newline types and warn the user of potential
biases in the reported line numbers?

On Apr 2, 2015, at 1:03 AM, Michael Betancourt [email protected] wrote:

I’m on a Mac — runtime errors are definitely off, too.
The model/data aren’t public so I’ll try to whip up a
pathological model tomorrow.

On Apr 2, 2015, at 12:53 AM, Bob Carpenter [email protected] wrote:

I haven't and just tried a few examples and it seems fine
for line numbers.

Can you give me the whole program and I can try it here?

Whatever's causing this issue would also cause runtime
numbers to be off, since they're using the same counting.

Are you running this Windows or Linux or Mac? They all have
different line ends, and that may also be messing things
up somehow.

  • Bob

On Apr 2, 2015, at 8:58 AM, Michael Betancourt [email protected] wrote:

I'm using a fresh copy of dev (both Stan and CmdStan). The new parser error line numbers look great, but they're consistently off the model I'm trying to build. For example,

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable "logmu_ooc_C" does not exist.

ERROR at line 124

122: beta_theta ~ normal(0,2);
123:
124: alpha_mu ~ normal(0,2);
^
125: alpha_sigma ~ normal(0,2);

while the model itself is given by

121: beta_theta ~ normal(0,2);
122:
123: alpha_mu ~ normal(0,2);
124: alpha_sigma ~ normal(0,2);
125: alpha_theta ~ normal(0,2);
126:
127: // Measurements
128: for(n in 1:N_ooc) {
129: ooc_count_C[n] ~ neg_binomial_2(exp(logmu_ooc_C), exp(log_sigma_ooc_C));
130: ooc_count_T[n] ~ neg_binomial_2(exp(log_mu_ooc_T), exp(log_sigma_ooc_T));
131: }

The parser should be identifying line 129.

Has anyone seen inconsistencies like this? If someone can confirm then I'll put together a simple reproducible example.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub.

@bob-carpenter
Copy link
Contributor

What do you want the behavior to be?

The line counting is happening below the level of the parser
in the iterator that Spirit uses. And it'd be hard to change
there. But the good news is we have the whole string representing
the program before it gets used to create the iterators.

Here's the iterator getting created in lang/parser.hpp:

  std::ostringstream buf;
  buf << input.rdbuf();
  std::string stan_string = buf.str();

  // ***** could add warnings or preprocessing here *****

  typedef std::string::const_iterator input_iterator;
  typedef boost::spirit::line_pos_iterator<input_iterator> lp_iterator;

  lp_iterator fwd_begin = lp_iterator(stan_string.begin());
  lp_iterator fwd_end = lp_iterator(stan_string.end());

It'd be easy to drop code in where I wrote the comment above.

  • Bob

On Apr 2, 2015, at 8:56 PM, Michael Betancourt [email protected] wrote:

Turns out it’s Windows newlines — the model was originally
written on a Windows box. Is there any way for the parser
to aggregate newline types and warn the user of potential
biases in the reported line numbers?

On Apr 2, 2015, at 1:03 AM, Michael Betancourt [email protected] wrote:

I’m on a Mac — runtime errors are definitely off, too.
The model/data aren’t public so I’ll try to whip up a
pathological model tomorrow.

On Apr 2, 2015, at 12:53 AM, Bob Carpenter [email protected] wrote:

I haven't and just tried a few examples and it seems fine
for line numbers.

Can you give me the whole program and I can try it here?

Whatever's causing this issue would also cause runtime
numbers to be off, since they're using the same counting.

Are you running this Windows or Linux or Mac? They all have
different line ends, and that may also be messing things
up somehow.

  • Bob

On Apr 2, 2015, at 8:58 AM, Michael Betancourt [email protected] wrote:

I'm using a fresh copy of dev (both Stan and CmdStan). The new parser error line numbers look great, but they're consistently off the model I'm trying to build. For example,

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable "logmu_ooc_C" does not exist.

ERROR at line 124

122: beta_theta ~ normal(0,2);
123:
124: alpha_mu ~ normal(0,2);
^
125: alpha_sigma ~ normal(0,2);

while the model itself is given by

121: beta_theta ~ normal(0,2);
122:
123: alpha_mu ~ normal(0,2);
124: alpha_sigma ~ normal(0,2);
125: alpha_theta ~ normal(0,2);
126:
127: // Measurements
128: for(n in 1:N_ooc) {
129: ooc_count_C[n] ~ neg_binomial_2(exp(logmu_ooc_C), exp(log_sigma_ooc_C));
130: ooc_count_T[n] ~ neg_binomial_2(exp(log_mu_ooc_T), exp(log_sigma_ooc_T));
131: }

The parser should be identifying line 129.

Has anyone seen inconsistencies like this? If someone can confirm then I'll put together a simple reproducible example.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub.

@betanalpha
Copy link
Contributor Author

So are the Windows newlines messing up Spirit directly?
Or is it having a mix of Unix and Windows newlines?

In the former case having a comment like “I detected Windows newlines
so be careful with the error message” would be huge given
the relatively large number of Windows users that we have.
In the latter having a “I found mixed newlines, be careful…”
would be great.

On Apr 2, 2015, at 1:35 PM, Bob Carpenter [email protected] wrote:

What do you want the behavior to be?

The line counting is happening below the level of the parser
in the iterator that Spirit uses. And it'd be hard to change
there. But the good news is we have the whole string representing
the program before it gets used to create the iterators.

Here's the iterator getting created in lang/parser.hpp:

std::ostringstream buf;
buf << input.rdbuf();
std::string stan_string = buf.str();

// ***** could add warnings or preprocessing here *****

typedef std::string::const_iterator input_iterator;
typedef boost::spirit::line_pos_iterator<input_iterator> lp_iterator;

lp_iterator fwd_begin = lp_iterator(stan_string.begin());
lp_iterator fwd_end = lp_iterator(stan_string.end());

It'd be easy to drop code in where I wrote the comment above.

  • Bob

On Apr 2, 2015, at 8:56 PM, Michael Betancourt [email protected] wrote:

Turns out it’s Windows newlines — the model was originally
written on a Windows box. Is there any way for the parser
to aggregate newline types and warn the user of potential
biases in the reported line numbers?

On Apr 2, 2015, at 1:03 AM, Michael Betancourt [email protected] wrote:

I’m on a Mac — runtime errors are definitely off, too.
The model/data aren’t public so I’ll try to whip up a
pathological model tomorrow.

On Apr 2, 2015, at 12:53 AM, Bob Carpenter [email protected] wrote:

I haven't and just tried a few examples and it seems fine
for line numbers.

Can you give me the whole program and I can try it here?

Whatever's causing this issue would also cause runtime
numbers to be off, since they're using the same counting.

Are you running this Windows or Linux or Mac? They all have
different line ends, and that may also be messing things
up somehow.

  • Bob

On Apr 2, 2015, at 8:58 AM, Michael Betancourt [email protected] wrote:

I'm using a fresh copy of dev (both Stan and CmdStan). The new parser error line numbers look great, but they're consistently off the model I'm trying to build. For example,

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable "logmu_ooc_C" does not exist.

ERROR at line 124

122: beta_theta ~ normal(0,2);
123:
124: alpha_mu ~ normal(0,2);
^
125: alpha_sigma ~ normal(0,2);

while the model itself is given by

121: beta_theta ~ normal(0,2);
122:
123: alpha_mu ~ normal(0,2);
124: alpha_sigma ~ normal(0,2);
125: alpha_theta ~ normal(0,2);
126:
127: // Measurements
128: for(n in 1:N_ooc) {
129: ooc_count_C[n] ~ neg_binomial_2(exp(logmu_ooc_C), exp(log_sigma_ooc_C));
130: ooc_count_T[n] ~ neg_binomial_2(exp(log_mu_ooc_T), exp(log_sigma_ooc_T));
131: }

The parser should be identifying line 129.

Has anyone seen inconsistencies like this? If someone can confirm then I'll put together a simple reproducible example.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub.

@bob-carpenter
Copy link
Contributor

I think the parser/iterator should treat end-of-line markers
from any OS as end-of-line markers. That ensures we get the
same behavior from the same byte stream cross platform.

I'd guess that's what's happening now, but we need to add
tests as part of resolving this issue and to figure out what's
going on.

  • add tests for behavior with mixed end-of-line markers

Then we can follow your second suggestion:

  • provide a warning if there are mixed end-of-line markers

I can take on doing both of these.

  • Bob

@syclik syclik modified the milestone: Future Jun 11, 2015
@bob-carpenter bob-carpenter self-assigned this May 16, 2017
@bob-carpenter bob-carpenter modified the milestones: 2.15.0++, Future May 16, 2017
@seantalts seantalts modified the milestones: 2.16.0, 2.17 Jul 5, 2017
@seantalts seantalts modified the milestones: 2.17.0, 2.17.0++ Sep 6, 2017
@bob-carpenter bob-carpenter changed the title Parser Error Line Numbers Off? add tests for multi-platform end-of-line behavior Jun 3, 2018
@seantalts seantalts modified the milestones: 2.18.0, 2.18.0++ Jul 13, 2018
@VMatthijs
Copy link
Member

This is still an issue in stanc2, from my experience.

It's been fixed in stanc3, however.

@mitzimorris mitzimorris removed this from the 2.18.1 milestone Dec 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants