Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FYI: A couple of weird bugs in flex-compatibility mode #137

Closed
teshields opened this issue Jun 26, 2022 · 11 comments
Closed

FYI: A couple of weird bugs in flex-compatibility mode #137

teshields opened this issue Jun 26, 2022 · 11 comments
Labels
help wanted Extra attention is needed question A technical question that has or needs clarification

Comments

@teshields
Copy link

teshields commented Jun 26, 2022

I'm running portability tests for an upcoming release of Ox (https://sourceforge.net/projects/ox-attribute-grammar-compiler/), and I've run into a couple of extremely weird bugs adding reflex as an option for several scanners using the '--yy' option for flex-compatibility mode. I'm using release 3.2.7.

  1. Using the reflex.a (or reflexmin.a) library from the distribution on the Homebrew site under macOS Monterey 12.4 (ARM M1 chip), reflex-generated flex-compatible scanners compiled with the GNU g++-11 v11.3.0_2 (or g++-12 v12.1.0_1) compiler (also from the Homebrew site) fail to link, due to a mismatch in symbol mangling, as far as I can determine. I've never encountered this type of failure before. As a (temporary) workaround, I'm using a separate version of the reflex libraries compiled with the Homebrew GNU g++-11 compiler for testing in these cases.
  2. These flex-compatibility test cases, when configured to use the Boehm-Demers-Weiser garbage collector (BDWGC), again using the GNU g++-11 compiler, segfault in the destructor of the FlexLexer class. If I compile the reflex libraries OMITTING the default '-O2' option using the GNU g++-11 compiler, all test cases work just fine, so this is the version I'm using as the workaround.

To slip the BDWGC library underneath everything, preprocessor macros replace calls to the functions 'free', 'malloc', 'realloc' and 'posix_memalign' in the reflex library with calls to the corresponding BDWGC functions 'GC_free', 'GC_malloc', 'GC_realloc' and 'GC_posix_memalign'.

In an attempt to debug 2 above, I linked with a version of the BDWGC library configured with assertions enabled.

Under lldb, for the segfault case (reflex libraries compiled with '-O2'), 'GC_posix_memalign' is never called, and I get an assertion failure in the BDWGC 'GC_free' function, saying that the object 'AbstractMatcher::buf_' being freed (absmatcher.h, line 335) wasn't allocated by a BDWGC allocation function. For the 'works just fine' case, 'GC_posix_memalign' is called.

If I can figure out how to modify the code in absmatcher.h to sidestep the GNU g++-11 optimizer bug, I'll let you know. A simple change to assigning the result of the call to 'posix_memalign' to a local 'volatile int' doesn't make a difference.

@genivia-inc
Copy link
Member

Thank you for the feedback. Let's get this sorted out.

  1. Using the reflex.a (or reflexmin.a) library from the distribution on the Homebrew site under macOS Monterey 12.4 (ARM M1 chip), reflex-generated flex-compatible scanners compiled with the GNU g++-11 (or g++-12) compiler (also from the Homebrew site) fail to link, due to a mismatch in symbol mangling, as far as I can determine.

I have never seen this problem on the test platforms (MaxOS intel, RHEL, Android, Cygwin, RPi Jesse). Did you check the versioning, so that a version-matching libreflexmin.a is used? The libs are built as default C++ (non-gnu, non-11 etc). I am still waiting on my new MacBook Pro M1, so I cannot try this out to see if I can replicate the problem, assuming it's related somehow to that platform. I do recall a recent minor change was made to fix a related problem, so you may want to use the latest version of RE/flex.

  1. These flex-compatibility test cases, when configured to use the Boehm-Demers-Weiser garbage collector (BDWGC), again using the GNU g++-11 compiler, segfault in the destructor of the FlexLexer class. If I compile the reflex libraries OMITTING the default '-O2' option using the GNU g++-11 compiler, all test cases work just fine, so this is the version I'm using as the workaround.

Interesting observation. Perhaps run valgrind and/or other memory tools? I've tested with memory sanitizers some time ago. Perhaps I'll do that again, just in case. My 30+ years of C and C++ experience before writing RE/flex came in handy to structure the code such that dynamic allocation is not difficult to verify as correct. The buffering mechanism with realloc is at the lowest-level and thus the least transparent. Other than that, there is not sufficient info here to make any conclusions.

@genivia-inc
Copy link
Member

By the way, to bypass the realloc calls, you may want to set the buffer size large for a large window on the input. This should be large enough to hold the longest matching pattern. It does not need to be as large as the input files. The buffer is 64K by default. To change the buffer, use -DBUFSZ=nnn with appropriate nnn (>4096) to compile all source code (BUFSZ is used in absmatcher.h).

@teshields
Copy link
Author

teshields commented Jun 27, 2022

I was able to do a quick check on my older Intel Mac mini (macOS Monterey 12.4):

  1. same results
  2. same results (test case, reflex/libreflex v3.2.7, libgc v8.0.6 all compiled & linked using Homebrew GNU g++-11 v11.3.0_2)

@genivia-inc
Copy link
Member

I have more questions than answers, since there isn't much here to offer any clues. Questions like these:

  • does the FlexLexer class instantiation and destruction crash without scanning input?
  • does increasing -DBUFSZ=nnn affect this issue?
  • does this issue happen with the BDWGC library? Or also without it?
  • is the issue affected by the choice of libreflex or libreflexmin? Or building from source without libraries (reflex/lib files)?
  • what does valgrind report? Or...
  • try clang with -fsanitize=memory and/or -fsanitize=leak see https://clang.llvm.org/docs/LeakSanitizer.html to check the application for memory and leak issues

@teshields
Copy link
Author

Here are a few additional data points.

Using a macOS BigSur 11.6.3 VM, with Homebrew downloaded reflex v3.2.7 & the bdwgc library v8.0.6

  • g++-11 v11.3.0_2: bdwgc library symbol mangling error
  • g++-10 v10.3.0: bdwgc library symbol mangling error
  • g++-9 v9.5.0: bdwgc library symbol mangling error

Using a Linux Ubuntu 22.04 LTS VM, bdwgc library v8.0.6-1.1build1 from the Ubuntu repository, reflex v3.2.7 compiled locally with the g++11 compiler (not in the Ubuntu repository)

  • g++11 v11.2.0-19ubuntu1: no problemo

There is something odd with the BDWGC library code in macOS.

@teshields
Copy link
Author

Responding to: I have more questions than answers (above) ...

  1. does the FlexLexer class instantiation and destruction crash without scanning input?
  • if I feed one of the simple test cases a 0-length input file, I get "syntax error" (from the parser) followed by the segfault in the FlexLexer destructor
  1. does increasing -DBUFSZ=nnn (s.b. -DREFLEX_BUFSZ) affect this issue?
  • the simple test case I used has an input file that is only 100 bytes long, so changing from the default of 64K has no effect, but just for grins, I tried both "8096" & "(128*1024)" with the same segfault in both cases
  1. does this issue happen with the BDWGC library? Or also without it?
  • works just fine w/o the BDWGC library, only happens if I slide in the BDWGC library

4.1. is the issue affected by the choice of libreflex or libreflexmin?

  • same segfault result with either

4.2. Or building from source without libraries (reflex/lib files)?

  • linking in the libreflex_*.o files in the lib/ directory directly, rather than with the libreflex.a file:
    • reflex compiled with Apple's native g++/clang++, test case compiled with Homebrew GNU g++-11 v11.3.0_2
      name mangling error
    • reflex & test case compiled with Homebrew GNU g++-11 v11.3.0_2 (default -O2 for reflex)
      segfault in the FlexLexer destructor

5.1. what does valgrind report? Or...

  • I've never used valgrind, and it isn't in the Homebrew repository for macOS
  • I'll go download the source, but that will probably be later this week

6.1. try clang with -fsanitize=memory (s.b. =address) and/or -fsanitize=leak

  • using Homebrew LLVM clang++ v13.0.1

    • compiled w/ -fsanitize=address: no report

    • compiled w/ -fsanitize=leak I get the following:

      LeakSanitizer: bad pointer 0x00019cc28f99
      LeakSanitizer: CHECK failed: sanitizer_allocator_secondary.h:177 "((IsAligned(reinterpret_cast(p), page_size_))) != (0)" (0x0, 0x0) (tid=101137095)
      Abort trap: 6

      but I don't know what this is telling me

  • using Apple's g++/clang++ v1316.0.21.2.5

    • compiled w/ -fsanitize=address: no report
    • compiled w/ -fsanitize=leak: compiler reports unsupported option

@genivia-inc
Copy link
Member

Thanks for the details.

  • compiled w/ -fsanitize=leak I get the following:
    LeakSanitizer: bad pointer 0x00019cc28f99
    LeakSanitizer: CHECK failed: sanitizer_allocator_secondary.h:177 "((IsAligned(reinterpret_cast(p), page_size_))) != (0)" (0x0, 0x0) (tid=101137095)
    Abort trap: 6
    but I don't know what this is telling me

Looks like some internal error in the sanitizer. See also google/sanitizers#1468

I don't really see any issues here, except when BDWGC is used.

There is something odd with the BDWGC library code in macOS.

Looks like it. I don't know what else can be done.

@teshields
Copy link
Author

FYI: my last debugging efforts (later this week):

  1. I'll try importing the source for one of my test cases into Xcode to see if it has any better clues to offer. Importing existing somewhat complicated command line code (wrt compile & link options & libraries) is always a bit of a pain, but it might tell me something.
  2. try out the "experimental" release of the BDWGC library v8.2.0
  3. valgrind

I'll report back anything interesting.

@genivia-inc genivia-inc added help wanted Extra attention is needed question A technical question that has or needs clarification labels Aug 12, 2022
@teshields
Copy link
Author

teshields commented Aug 29, 2022

I'm delinquent in reporting back on my debugging status with respect to these 2 'weird bugs'.

Note: these ‘bugs’ are not specific to flex-compatibility mode.

The 'weird bug 1' remains a mystery: The macOS Apple & (Homebrew) LLVM C++ compilers just do not generate (mangled) name references in some cases that are compatible with those generated by macOS (Homebrew) GNU C++ compilers for the RE/flex library. I don't see the same problem with Linux LLVM and GNU C++ compilers and the RE/flex library installed from the Ubuntu repository.

On the other hand, 'weird bug 2' is a self-inflicted wound: My original approach to sliding the BDWGC garbage collector underneath Ox and its test cases worked with Flex, since there is not a Flex-specific library to be linked into the executables. However, I didn't originally notice that the RE/flex library member 'matcher.o' contains external references to both C memory allocation functions and C++ new/delete operators, so in particular, object 'AbstractMatcher::buf_' is being free'd by the BDWGC 'GC_free' function but allocated by the C 'posix_memalign' function. I cannot explicitly state why the segfault only occurs with optimized code, but I'm not exactly surprised, as I've seen analogous segfault situations in the past. I've replace my pre-processor approach with a replacement library for the C allocation functions that are just shells around calls to the corresponding BDWGC functions.

@genivia-inc
Copy link
Member

Can we close this?

@genivia-inc genivia-inc closed this as not planned Won't fix, can't repro, duplicate, stale Mar 25, 2023
@teshields
Copy link
Author

teshields commented Mar 25, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question A technical question that has or needs clarification
Projects
None yet
Development

No branches or pull requests

2 participants