Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix out-of-bound access to "alex_check" array #223

Merged
merged 2 commits into from
Apr 3, 2023

Conversation

hsyl20
Copy link
Contributor

@hsyl20 hsyl20 commented Mar 31, 2023

offset can be negative. We must check this before using it to index into alex_check array.

Caught with GHC's JS backend (native GHC would only segfault in rare circumstances).

@hsyl20 hsyl20 force-pushed the hsyl20/fix-offset branch from 5c91321 to 3e87dfc Compare March 31, 2023 16:07
Copy link
Member

@andreasabel andreasabel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Could you please add a testcase that demonstrates the bug and how it is fixed by your patch?

= alexIndexInt16OffAddr alex_table offset

| otherwise
= alexIndexInt16OffAddr alex_deflt s
Copy link
Member

@andreasabel andreasabel Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is too early in the morning and my brain hasn't woken up... but I cannot see the semantic difference between the new and the old code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have explained more ;)

The issue is that 'check' is unlifted, hence it's a strict binding. The way it was written before, it was evaluated before checking that offset was greater than or equal to 0. I've just moved the binding after the check.

Adding a test case for this seems difficult. We could add an assertion in the indexing function in debug mode that the offset is positive, but that seemed overkill. I've found this bug in the lexer code generated for Cabal-syntax. Any use of this code with the JS backend fails (the JS backend doesn't have an addressable heap, instead 'alex_check' array is represented as a JS array, so out-of-bound access triggers an exception).

Copy link
Member

@andreasabel andreasabel Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that is subtle indeed. I unlearned ML a long time ago... ;-)

  • Could you add your explanation to the commit message?
  • Could you add a CHANGELOG entry for this?
  • Do you need a release of the fix?
  • (Let's skip the test case as the change obviously makes the code more correct.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

A release would be nice as I'll have to fix Cabal-syntax's lexer.

Copy link
Member

@andreasabel andreasabel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@andreasabel
Copy link
Member

@Ericson2314 : Could you please add me to the maintainers: https://hackage.haskell.org/package/alex/maintainers/

hsyl20 and others added 2 commits April 3, 2023 14:25
`offset` can be negative. We must check this before using it to index
into `alex_check` array.

The issue is that 'check' is unlifted, hence it's a strict binding. The
way it was written before, it was evaluated before checking that offset
was greater than or equal to 0. I've just moved the binding after the
check.

Caught with GHC's JS backend running Cabal-syntax's lexer. The JS
backend uses arrays to represent unlifted string literals, so the bug
results in out-of-bound JS array access. With the native backends the
out-of-bound array access might trigger a segfault but it is more likely
to just read some random values in the heap. That's why it went
unnoticed.
@andreasabel
Copy link
Member

andreasabel commented Apr 3, 2023

@andreasabel andreasabel merged commit e6c956a into haskell:master Apr 3, 2023
@hsyl20
Copy link
Contributor Author

hsyl20 commented Apr 4, 2023

@andreasabel Awesome. Thanks a lot for your responsiveness!

hsyl20 added a commit to hsyl20/cabal that referenced this pull request Apr 4, 2023
Regenerate Lexer.hs with Alex 3.2.7.2 to fix issue #haskell#8892 (out-of-bound
access due to haskell/alex#223).
hsyl20 added a commit to hsyl20/cabal that referenced this pull request Apr 4, 2023
Regenerate Lexer.hs with Alex 3.2.7.2 to fix issue #haskell#8892 (out-of-bound
access due to haskell/alex#223).
hsyl20 added a commit to hsyl20/cabal that referenced this pull request Apr 4, 2023
Regenerate Lexer.hs with Alex 3.2.7.2 to fix issue #haskell#8892 (out-of-bound
access due to haskell/alex#223).
hsyl20 added a commit to hsyl20/cabal that referenced this pull request Apr 4, 2023
Regenerate Lexer.hs with Alex 3.2.7.2 to fix issue #haskell#8892 (out-of-bound
access due to haskell/alex#223).
@RyanGlScott
Copy link
Member

One consequence of this PR is that alex-generated .hs files now require PatternGuards, which wasn't required before:

$ ghc -XHaskell98 Foo.hs
[1 of 1] Compiling Main             ( Foo.hs, Foo.o )

Foo.hs:3747:19: warning:
    accepting non-standard pattern guards (use PatternGuards to suppress this message)
        (offset >= (0)),
        check <- alexIndexInt16OffAddr alex_check offset,
        (check == ord_c)
     |
3747 |                   | GTE(offset,ILIT(0))
     |                   ^^^^^^^^^^^^^^^^^^^^^..

In fact, I had a project break with alex-3.2.7.2 because of this. Perhaps it would be worth documenting the PatternGuards requirement somewhere?

@andreasabel
Copy link
Member

@hsyl20 : Can this be implemented without PatternGuards?
How about this:

 new_s = if GTE(offset,ILIT(0)) 
                && let check = alexIndexInt16OffAddr alex_check offset in EQ(check,ord_c)
                          then alexIndexInt16OffAddr alex_table offset
                          else alexIndexInt16OffAddr alex_deflt s

@andreasabel
Copy link
Member

hsyl20 added a commit to hsyl20/cabal that referenced this pull request Apr 14, 2023
Regenerate Lexer.hs with Alex 3.2.7.2 to fix issue #haskell#8892 (out-of-bound
access due to haskell/alex#223).
hsyl20 added a commit to hsyl20/cabal that referenced this pull request Apr 26, 2023
Regenerate Lexer.hs with Alex 3.2.7.2 to fix issue #haskell#8892 (out-of-bound
access due to haskell/alex#223).
mergify bot pushed a commit to haskell/cabal that referenced this pull request Apr 26, 2023
Regenerate Lexer.hs with Alex 3.2.7.2 to fix issue ##8892 (out-of-bound
access due to haskell/alex#223).

(cherry picked from commit ca7a8e2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants