Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

com.sun.jna.NativeLibraryTest fails on macOS 12 (Monterey) #1423

Closed
ezarko opened this issue Mar 14, 2022 · 21 comments · Fixed by #1502
Closed

com.sun.jna.NativeLibraryTest fails on macOS 12 (Monterey) #1423

ezarko opened this issue Mar 14, 2022 · 21 comments · Fixed by #1502

Comments

@ezarko
Copy link

ezarko commented Mar 14, 2022

  1. Version of JNA and related jars
    0774f82 master
  2. Version and vendor of the java virtual machine
    java 17.0.2 2022-01-18 LTS
    Java(TM) SE Runtime Environment (build 17.0.2+8-LTS-86)
    Java HotSpot(TM) 64-Bit Server VM (build 17.0.2+8-LTS-86, mixed mode, sharing)
  3. Operating system
    macOS Monterey 12.2.1 (21D62)
  4. System architecture (CPU type, bitness of the JVM)
    Apple M1 Pro
  5. Complete description of the problem
    [junit] Testsuite: com.sun.jna.NativeLibraryTest
    [junit] Tests run: 22, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.21 sec
    [junit]
    [junit] Testcase: testAvoidDuplicateLoads(com.sun.jna.NativeLibraryTest): FAILED
    [junit] Library should be newly loaded after explicit dispose of all native libraries expected:<1> but was:<10>
    [junit] junit.framework.AssertionFailedError: Library should be newly loaded after explicit dispose of all native libraries expected:<1> but was:<10>
    [junit] at com.sun.jna.NativeLibraryTest.testAvoidDuplicateLoads(NativeLibraryTest.java:91)
    [junit] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    [junit] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    [junit] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    [junit]
    [junit]
    [junit] Test com.sun.jna.NativeLibraryTest FAILED
  6. Steps to reproduce
    ant
@matthiasblaesing
Copy link
Member

From my perspective you are seeing a flaky test. From the failing test:

    // Give the system a moment to unload the library; on OSX we
    // occasionally get the same library handle back on subsequent dlopen
    Thread.sleep(2);

Please rerun the test.

@ezarko
Copy link
Author

ezarko commented Mar 14, 2022

I tried it twice more. I got the same result each time.

@matthiasblaesing
Copy link
Member

Sorry, not reproducible by me.

On:

  • aarch64 with the default JDK on that system Temurin-17.0.1+12 (the system provided by @dkocher)
  • aarch64 with JDK Temurin-17.0.2+8 (latest Temurin build)
  • amb64 with 15.0.2+7-27 (if I remember correctly this was an AdoptOpenJDK build)

all run clean.

@matthiasblaesing
Copy link
Member

@dkocher indicated, that he can reproduce the problem. I still fail and we need to better pin point it. Could you (@ezarko and @dkocher ) please indicate, which ant version you are using?

@ezarko
Copy link
Author

ezarko commented Mar 15, 2022

Apache Ant(TM) version 1.10.12 compiled on October 13 2021

@tresf

This comment was marked as outdated.

@matthiasblaesing
Copy link
Member

Thank you for the inspiration.

I slightly modified the test and ran it in isolation with the DYLD_PRINT_APIS environment variable set to 1. This shows, that the test library is only dlopened once and dlclose is invoked on that handle. dyld does not report whether or not it thinks dlclose succeeded, but I added some debugging around it and (a) JNA already checks the return and raises an exception if dlclose would have failed and return value indicates, that it succeeded.

Reading this: rust-lang/rust#28794 (comment) being able to unload a library on mac OS or derivates is not something I'd rely on.

@tresf
Copy link
Contributor

tresf commented Jan 12, 2023

I tried changing all calls to RTLD_LOCAL to see if this could be coerced, but no luck. Also, no luck manipulating DYLD_<FOO> flags. No matter what I do, callCount() always remains at 10.

@tresf
Copy link
Contributor

tresf commented Jan 12, 2023

Pinging @dyorgio, he's helped me with some obscure macOS issues in the past. 🍻

@dbwiddis dbwiddis changed the title com.sun.jna.NativeLibraryTest fails on Apple M1 Pro com.sun.jna.NativeLibraryTest fails on macOS 12 (Monterey) Jan 13, 2023
@matthiasblaesing
Copy link
Member

Here is the test setup:

https://github.com/java-native-access/jna/tree/macos_debugging

And this is the result:

https://github.com/java-native-access/jna/actions/runs/3913589087/jobs/6689649929

@dbwiddis
Copy link
Contributor

dbwiddis commented Jan 13, 2023

And this is the result:

https://github.com/java-native-access/jna/actions/runs/3913589087/jobs/6689649929

I see two dlopen() calls at Line 1152 and again at Line 1264.

So the reference count is still 1 at dlclose().

Er, nevermind, there it is at Line 1247.

@dbwiddis
Copy link
Contributor

I think I'm convinced this test should be removed (or altered to test the java-side reference, not the native handle), at least on macOS and possibly all *nix.

The POSIX spec does not require the OS to unload the library:

An application writer may use dlclose() to make a statement of intent on the part of the process, but this statement does not create any requirement upon the implementation.

and

Although a dlclose() operation is not required to remove any functions or data objects from the address space, neither is an implementation prohibited from doing so.

Some other poking about indicates that there may be a gcc compiler flag which could be relevant. This man page (on Linux but referencing gcc) might be of interest:

-fno-gnu-unique
On systems with recent GNU assembler and C library, the C++
compiler uses the "STB_GNU_UNIQUE" binding to make sure that
definitions of template static data members and static local
variables in inline functions are unique even in the presence
of "RTLD_LOCAL"; this is necessary to avoid problems with a
library used by two different "RTLD_LOCAL" plugins depending
on a definition in one of them and therefore disagreeing with
the other one about the binding of the symbol. But this
causes "dlclose" to be ignored for affected DSOs; if your
program relies on reinitialization of a DSO via "dlclose" and
"dlopen", you can use -fno-gnu-unique.

@tresf
Copy link
Contributor

tresf commented Jan 13, 2023

Digging deeper into that gnu flag @dbwiddis posted... https://bugzilla.redhat.com/show_bug.cgi?id=1083292

Perhaps LLVM relevant: https://reviews.llvm.org/D42865?id=132665

.. but since the diff only touched ELF source files, I don't' think this is available on mac.

@dbwiddis
Copy link
Contributor

Here's something of interest. The dlOpen call uses RTLD_LAZY and RTLD_GLOBAL flags:

dlopen("/Users/runner/work/jna/jna/build/native-darwin-x86-64/libtestlib.dylib", 0x00000009)

 RTLD_GLOBAL  Symbols exported from this image (dynamic library or bundle)
              will be available to any images build with -flat_namespace
              option to ld(1) or to calls to dlsym() when using a special
              handle.

 RTLD_LOCAL   Symbols exported from this image (dynamic library or bundle)
              are generally hidden and only availble to dlsym() when
              directly using the handle returned by this call to dlopen().

@tresf
Copy link
Contributor

tresf commented Jan 13, 2023

Here's something of interest. The dlOpen call uses RTLD_LAZY and RTLD_GLOBAL flags:

dlopen("/Users/runner/work/jna/jna/build/native-darwin-x86-64/libtestlib.dylib", 0x00000009)

Yeah, I tried to coerce these here #1423 (comment) to no avail. Happy to be proven wrong.

@dbwiddis
Copy link
Contributor

Digging deeper into that gnu flag @dbwiddis posted... https://bugzilla.redhat.com/show_bug.cgi?id=1083292

A comment on that report says that our test is the bug, and we simply should not be relying on unloading behavior by dlclose() as I mentioned above.

The only issue are the assumptions about dlclose behavior, dlclose is certainly not required to always unload the library, even before unique symbols often it has been unable to either temporarily (until some other libraries are dlclosed) or permanently unload certain libraries.

If some library relies on that, it is simply buggy.

@tresf
Copy link
Contributor

tresf commented Jan 13, 2023

Yeah, I tried to coerce these here #1423 (comment) to no avail. Happy to be proven wrong.

I didn't specifically mention it, but I also tested forcing these directly in testlib.c dispatch.c.

A comment on that report says that our test is the bug, and we simply should not be relying on unloading behavior by dlclose() as I mentioned above.

I think this statement is true for general purposes, but even Apple staff say it should work if we're not using certain Object-C bindings.

@dbwiddis
Copy link
Contributor

I think this statement is true for general purposes, but even Apple staff say it should work if we're not using certain Object-C bindings.

That statement was also relevant for 10.x. This new behavior is on 12.x (or possibly XCode 13).

But the unloading behavior is simply not required by the standard, Apple (and any POSIX-compliant implementation) is free to change things, and we should not rely on it.

@tresf
Copy link
Contributor

tresf commented Jan 13, 2023

Yeah that's my fear as well although it's good to exhaust all options before disabling the unit test on Apple.

@dyorgio
Copy link
Contributor

dyorgio commented Jan 13, 2023

Hi @tresf,

I'm not sure, but maybe editing DYLD_FALLBACK_LIBRARY_PATH before load could solve:

StackOverflow ref

Very nice repo 😄 :

https://github.com/flandr/wtf-osx-dlopen

@tresf
Copy link
Contributor

tresf commented Jan 13, 2023

Hi @tresf,

👋 thanks for chiming in. Sorry I believe my initial post (quote from @dbwiddis) is quite off-topic now (I've hidden it). I think the issue is actually with dlclose(). Thanks for those links. wtf-osx-dlopen 🤣

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants