Fix antlr/antlr4#1890 Standalone CR should be recognized as line separator by nixel2007 · Pull Request #58 · tunnelvisionlabs/antlr4

nixel2007 · 2019-11-22T07:05:25Z

Hello!

I've made changes to treat stand-alone CR as line separator.
This changes were tested on millions of LOC with-out any noticable perfomance regress. Also I use my fork to parse files everyday as a part of static analysis process.

P.S. Link to my PR to original ANTLR4 repo opened half an year ago...

daniellansun · 2019-11-22T07:12:20Z

The builds on CI servers failed

nixel2007 · 2019-11-22T07:47:02Z

Hi, @danielsun1106

I see an error in build logs:

testCharSetRange(org.antlr.v4.test.tool.TestLexerExec)  Time elapsed: 0.078 sec  <<< FAILURE!
org.junit.ComparisonFailure: expected:<...0]
[@1,4:5='34',<1>,[1:4]
[@2,7:8='a2',<2>,1:7]
[@3,10:12='abc',<2>,1:10]
[@4,18:17='<EOF>',<-1>,2]:3]
> but was:<...0]
[@1,4:5='34',<1>,[2:1]
[@2,7:8='a2',<2>,2:4]
[@3,10:12='abc',<2>,2:7]
[@4,18:17='<EOF>',<-1>,3]:3]
>
	at org.junit.Assert.assertEquals(Assert.java:115)
	at org.junit.Assert.assertEquals(Assert.java:144)
	at org.antlr.v4.test.tool.TestLexerExec.testCharSetRange(TestLexerExec.java:497)

But I don't understand what exactly numbers after the second comma mean. Could you guide me about this a little?

The first number is a token type, i suggest. And the second and the third ones?

daniellansun · 2019-11-22T09:31:56Z

@nixel2007 I can not remember its meaning either... @sharwell should be able to tell us ;-)

sharwell · 2019-11-22T14:38:21Z

[@2,7:8='a2',<2>,2:4]

Item	Meaning
`@2`	Token index (this in the third token)
`7:8`	Token span (character position) in the original input
`a2`	Token text
`<2>`	Token type
`2:4`	Starting line:column for the token

Note that I'm not planning to take this change. It should be possible to derive from the existing types in the library if you need to customize the token positioning mechanism. (It would have been better if tokens didn't track this information at all, but it's too late to remove now.)

nixel2007 · 2019-11-22T16:37:12Z

a-ha, test string contains standalone CR, so the second 34 token is placed to the 2nd line. so, new numbers is correct.

antlr4/tool/test/org/antlr/v4/test/tool/TestLexerExec.java

Line 486 in 5392011

String found = execLexer("L.g4", grammar, "L", "34\r 34 a2 abc \n ");

It should be possible to derive from the existing types in the library if you need to customize the token positioning mechanism.

could you point me to this classes/methods? I haven't digged it a lot, but I suppose that if I'll try to change token position somewhere else than LexerATNSimulator#consume, I will have to store line/character offsets and change them for every token rather than one-time change in consume. Not good for RAM neither CPU.

sharwell · 2019-11-22T16:52:04Z

@nixel2007 LexerATNSimulator.consume is a virtual method. Create your own type derived from that, and override consume to handle this case.

nixel2007 · 2020-01-19T13:43:42Z

Create your own type derived from that, and override consume to handle this case.

Did as you said, thank you.
One small question - is there any option to override default constructor from g4 file?
The only way I've found for this is a constructor with another signature:

1c-syntax/bsl-parser@86bbfc5#diff-9f09662df115bb4cb45e8ad2afc23bbdR25

sharwell · 2020-01-19T18:05:41Z

You can generate your grammar as abstract if you wish to provide a different constructor in code. I forget the exact command line syntax but it's handled by this code:

antlr4/tool/src/org/antlr/v4/tool/Grammar.java

Lines 625 to 627 in bdcd934

    
           public boolean isAbstract() { 
        
           	return Boolean.parseBoolean(getOptionString("abstract")); 
        
           }

nixel2007 · 2020-01-19T18:08:43Z

Ok, i'll try it. thank you!

nrmancuso · 2021-06-27T14:21:59Z

@nixel2007 did you end up successfully overriding the LexerATNSimulator constructor by generating your grammar as abstract to solve this issue? Do you have a working example that you can point me to?

nixel2007 · 2021-06-27T16:05:26Z

@nmancus1 yep. I've created CRAwareLexerATNSimulator (https://github.com/1c-syntax/bsl-parser/blob/master/src/main/java/com/github/_1c_syntax/bsl/parser/CRAwareLexerATNSimulator.java) and pass it to lexer in members block - https://github.com/1c-syntax/bsl-parser/blob/855158d4e7948326f76cf32b33ebd16c580896be/src/main/antlr/BSLLexer.g4#L29-L35. no need to make grammar abstract.

nixel2007 added 2 commits November 21, 2019 16:09

Standalone CR should be recognized as line separator [java]

fde6a49

Contributors sign

f35e480

nixel2007 closed this Jan 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix antlr/antlr4#1890 Standalone CR should be recognized as line separator#58

Fix antlr/antlr4#1890 Standalone CR should be recognized as line separator#58
nixel2007 wants to merge 2 commits intotunnelvisionlabs:masterfrom
nixel2007:feature/standalone-cr

nixel2007 commented Nov 22, 2019 •

edited

Loading

Uh oh!

daniellansun commented Nov 22, 2019

Uh oh!

nixel2007 commented Nov 22, 2019 •

edited

Loading

Uh oh!

daniellansun commented Nov 22, 2019

Uh oh!

sharwell commented Nov 22, 2019 •

edited

Loading

Uh oh!

nixel2007 commented Nov 22, 2019 •

edited

Loading

Uh oh!

sharwell commented Nov 22, 2019

Uh oh!

nixel2007 commented Jan 19, 2020 •

edited

Loading

Uh oh!

sharwell commented Jan 19, 2020

Uh oh!

nixel2007 commented Jan 19, 2020

Uh oh!

nrmancuso commented Jun 27, 2021

Uh oh!

nixel2007 commented Jun 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nixel2007 commented Nov 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daniellansun commented Nov 22, 2019

Uh oh!

nixel2007 commented Nov 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daniellansun commented Nov 22, 2019

Uh oh!

sharwell commented Nov 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nixel2007 commented Nov 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sharwell commented Nov 22, 2019

Uh oh!

nixel2007 commented Jan 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sharwell commented Jan 19, 2020

Uh oh!

nixel2007 commented Jan 19, 2020

Uh oh!

nrmancuso commented Jun 27, 2021

Uh oh!

nixel2007 commented Jun 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nixel2007 commented Nov 22, 2019 •

edited

Loading

nixel2007 commented Nov 22, 2019 •

edited

Loading

sharwell commented Nov 22, 2019 •

edited

Loading

nixel2007 commented Nov 22, 2019 •

edited

Loading

nixel2007 commented Jan 19, 2020 •

edited

Loading