Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macOS: problems with Java AWT::Robot and AWT::Event-Queue (window not displayed, hangs waiting for events to finish) #911

Open
RaiMan opened this issue Dec 19, 2020 · 17 comments
Assignees
Labels
bug Unable to deliver desired behavior (crash, fail, untested) on-hold Problem requiring further user input to address

Comments

@RaiMan
Copy link

RaiMan commented Dec 19, 2020

This might be related to #906

My findings are when trying to use SikuliX through JPype on macOS.
On Windows and Linux it seems to work as reported by other users.

Find the issue doc here: RaiMan/SikuliX1#400

For my plans Python-to-SikuliX I will ignore JPype, until this problem area is fixed in Java.

@RaiMan RaiMan changed the title macOS: problems with Java AWT::Robot and AWT::Event-Queue (window not displayed, hangs waiting for events too finish) macOS: problems with Java AWT::Robot and AWT::Event-Queue (window not displayed, hangs waiting for events to finish) Dec 19, 2020
@Thrameos
Copy link
Contributor

Thanks. This is definitively something in the path towards fixing it.

So the path forward on this is I need to go to the travel department and pickup a loner OSX laptop and set up a development. Unfortunately, there few logistical problems with that plan. First I am not allowed on site to get to travel to due to COVID 19. Next I suspect travel is currently closed down due to COVID 19. Second being December finding someone who knows how to resolve those issues means the soonest I will be able to work on this will be in January.

@RaiMan
Copy link
Author

RaiMan commented Dec 20, 2020

@Thrameos
If you have something that could be tested/evaluated in macOS meanwhile, feel free to come back and contact me directly.
Happy Christmas. I wish you all the best for 2021.

@Thrameos
Copy link
Contributor

Unfortunately this is the sort of problem that I would need to do a fair amount of exploring. It is only obvious in retrospect after you find a series of cases that succeed or fail that you can find the documentation that points to the source of the problem.

@RaiMan
Copy link
Author

RaiMan commented Dec 20, 2020

@Thrameos
I feel with you ;-)

@Thrameos
Copy link
Contributor

Okay I vaguely recall seeing some information on this. The problem is that Java and AppKit have two separate requirements. Java wants to handle events on a separate event handler thread, but AppKit wants to handle all events on the Appkit main thread. When you start Python, you are almost always running on the main thread (of Python) which may be the AppKit main thread. Thus only that thread can handle events.

This runs into a serious problem. You want to keep running on the Python thread, but then you can't service events. To handle this broken situation you would need to transfer control and start running Python on a different thread and tell the main thread to service the events.

Even if the Python main is not the Appkit main, the appkit main thread is most certainly already started by Python when Java starts. So now Java is going to start its event thread in a different thread than the current AppKit thread and well we are headed for another deadlock.

They gave some code to "help" with this issue, but if the Python main is already the Appkit main there is no real way to tell Python to continue execution starting from this line in the code for things like interactive programming, this is always going to be a weak point. It is not clear that the Python main thread has to be the Appkit thread, but I don't know enough about the API to determine that.

https://wiki.openjdk.java.net/display/MacOSXPort/Java+vs.+AppKit+Threading+Manifesto

Unfortunately not having a Mac there isn't a lot a can do to develop a better solution and this weakness is likely to cause issues for the future. We really need and expert to guide us on how to get this to work.

@Thrameos
Copy link
Contributor

Thrameos commented Dec 21, 2020

Not sure it this information helps. It seems to be indicating that we need to add an enter and exit section to our JNI calls to work with AppKit.

http://3.210.201.83/downloads/MacOSX10.8.sdk/System/Library/Frameworks/JavaVM.framework/Frameworks/JavaNativeFoundation.framework/Versions/A/Headers/JNFJNI.h

There are only a handful of JNI incoming calls from Java. The most notable being JProxy call back hook. There are others on the factory hooks, but those are not very likely to be able to stall unless the user plugs something horrible into the class creation routines. If someone wants to contribute by adding some of these blocks and seeing if it helps with the deadlock it would be appreciated. I am not sure it is the right path, but perhaps it will help.

Edit: These may be pointlessly out of date as most JNF comes up with references from 10 years ago.

@Thrameos
Copy link
Contributor

http://mirror.informatimago.com/next/developer.apple.com/technotes/tn2005/tn2147.html

It is important to understand that the JVM cannot be started from the native application's main thread if your Java code uses an AWT/Swing-based GUI; in such applications the main thread must be kept free for use by Cocoa's event loop. Both of the above examples abide by this rule: Listing 16 starts a CFRunLoop in main, which is used by the NSApplication instance that the AWT creates. Listing 17 responds to NSApplication's applicationWillFinishLaunching: delegate method, indicating that an NSApplication is already in place on the main thread. Starting an AWT application from the native launcher's main thread significantly affects performance and may produce unrecoverable errors. Listing 18 shows a paraphrased example of the most common error thrown...

Again not sure if this applies to our Python/Java model. Assuming it does we would need to start Java in a different thread than main then attach the Pythons main thread as a daemon thread. This is likely to cause issues with our shutdown model, though perhaps there is a way to handle it because if Python is a daemon thread it won't count toward the shutdown.

@RaiMan
Copy link
Author

RaiMan commented Dec 21, 2020

Very good findings.
I guess you hit the problem cause.

But this means IMHO, that the approach of JPype in macOS only works with Java stuff, that does not use any AWT/Event-Loop features. Hence JPype -> SikuliX will not work on macOS, until someone finds a solution based on the above mentioned evaluations (sorry, but I am not one of those someone's).

I will continue to watch the further development in this problem area of JPype.

@Thrameos
Copy link
Contributor

Assuming this is actually is the issue, there are two approaches.

  1. Structural requirements for OSX using JPype modules. This is what is happening in the existing jpype/gui module. If you code a particular style you can dodge the problem. But the downside is unless someone wants osx, they won't code it with the required style and after it is written making a switch would be costly. The result is most JPype using projects that exercise AWT won't work.

  2. Structural changes to JPype. In this case rather that starting the JVM in the main thread, we have startJVM and shutdownJVM work with a C++ spawned thread (yes that means we need to create a thread using the OS specific commands as we need the thread to be able to live and arbitrary scope regardless of the state of Java or Python). The startJVM would create the thread which calls the real start method, and waits for the start to complete to initialize resources. It would then attach the thread as a daemon so that Python actions can continue even if the Java is terminated. The jmain thread would just sleep until it receives a message to proceed to terminate. Shutdown would then become much softer as it is just a message to jmain thread to start with the shutdown rather than a hard cut point. In the OSX case, we would also try to create a second appmain thread so that there is no deadlocks between jmain and pymain threads. This may cause weird side effects between qt/java on osx so a lot of testing is required.

I am going to assume that option 1 is a poor one and examine the changes that result from option 2.

The user visible changes required for the second option is

  • startJVM would have a keyword argument headless which would be false be default. If headless is true then the osx would not try to spawn the appmain thread.
  • shutdownJVM would have a keyword argument block defaulting to true which tells whether the shutdown requires the action to complete before Python proceeds. There is one danger at this point as it you proceed and the Python machine completes its shutdown sequence before Java completes its then we will segfault on any existing proxy objects. The "atexit" for Python would need to wait for Java one to complete so we avoid the danger point. Likely we require an additional internal "waitJVMshutdown" command so we ensure we checkpoint.

The other change is jpype/gui would need to be deprecated. It was not documented and I am not aware of any users, but once we make this structure change it may interfere or confuse.

The consequences for existing JPype using projects would be minimal. Some would require adding headless to continue operating on osx as before. Others may have minor changes in the shutdown behavior to deal with the softer shutdown. Though there are still projects having issue with our current shutdown so from that prospective it may make things better. (I suspect projects that are still having issues have "attachToJVM" calls buried in them that is causing the JVM to stall on exit.)

I can likely make these structural changes without actually having access to an OSX machine, but there is little way can actually debug it enough to know if it fixes the osx issues until I gain access, so unless someone volunteers to be a guinea pig I should likely place this on the wishlist for now and start work once I can access an osx machine.

@marscher The proposed changes are structural in nature and likely have at least some consequences from the user prospective even if I keep it minimal. If you have some direction on the path to take, it would be appreciated. We have had a lot of traffic on OSX users here so I am guessing interest is relatively high. But if you think I have overlooked something on the user prospective, then I should adjust the plans.

@RaiMan
Copy link
Author

RaiMan commented Dec 22, 2020

@Thrameos
I could do tests with a dev version and even some debugging in the Python and Java area (I am using PyCharm and IDEA).
... but I am not familiar with the area C++/Apple native.
All the best.

@Thrameos
Copy link
Contributor

Sure thing. Right now I am just debating the priorities for this particular task relative to others. My main large tasks appear to be

  • Implementation of the reverse bridge allowing Java to call Python
  • Extension of Java classes from within Python
  • Dealing with OSX issues

The order that I complete these tasks will have a large influence on the final product. If I have the reverse bridge then how I accomplish the other tasks may be very different than if I proceed in the opposite order not to mention the potential time that a solution may be available. The reverse bridge is the largest of the tasks in terms of scope, but it also simplifies all other tasks which complicates the decision process. I will think this over and get back to you when I have defined a path.

@RaiMan
Copy link
Author

RaiMan commented Dec 23, 2020

@Thrameos
I agree and wish you all the best on the way to your targets.

Feel free, to delete this remark:
I know that using the alternative Python-Java bridge py4j works on macOS (I have a sleeping project that implements the SikuliX features in C-Python this way). I guess that the performance with JPype should be slightly better, but py4j can also be used over the net (different target machine).

@Thrameos
Copy link
Contributor

Thrameos commented Dec 23, 2020

Py4j is intended for a different audience than JPype. It exists to create an RPC like connection between Python and Java. I often encourage projects that require a transient transactional type connection rather than a highly integrated one to use Py4j. Both approaches have their strong points and weak points.

@adrian-evo
Copy link

Since I am one of the JPype users that would be interested in a solution for this macOS issue, I would like to let you know that in the last five years I was using SikuliX with Robot Framework with the single available solution of Remote Server and XML-RPC protocol. Since this method is slow, I had to start the server once before any test run, and close it after, although on web testing this is not the most important thing to do. The ideal solution is that all tests are independent of each others and there should be no need to start e.g. a remote server just to be able to use a certain library when needed.

For that reason I was lately proposing a new Python library (robotframework-sikulixlibrary) that is working great on Windows (also tested on Linux), much more advanced than with XML-RPC protocol so I am very happy to use it also in the future, and to suggest it also to other Robot Framework SikuliX users. For us, the other alternative would be e.g. image libraries built on top of pyautogui, but it seems these libraries are quite limited when compared with SikuliX :)

@RaiMan
Copy link
Author

RaiMan commented Dec 23, 2020

@adrian-evo
Looking into your JPype based implementation of robotframework-sikulixlibrary IMHO it would be no problem, to also implement the py4j bridge as an option for those who want to use it for whatever reason.
I will check, wether I am right with a fork of your project. It is a good possibility, to evaluate the differences between the two approaches with respect to performance and implementation aspects.
Anyways I will add an option to start/stop the py4j-server in SikuliX via the extension feature.
Wish you all the best and a healthy 2021.

@adrian-evo
Copy link

@RaiMan
I agree that having the possibility to choose one of the options would be the best solution, since the two approaches could have different usage reasons, as mentioned also by @Thrameos, and as long as the implementation of the keywords library can be kept as simple as possible. So let's see what the final findings will be.
Have a safe and happy holiday season!

@Thrameos Thrameos self-assigned this Jan 23, 2021
@Thrameos Thrameos added bug Unable to deliver desired behavior (crash, fail, untested) on-hold Problem requiring further user input to address labels Jan 23, 2021
@Thrameos
Copy link
Contributor

This one is still waiting for access to an OSX machine. I believe the solution would be to spawn JVM in a separate thread and then use a message to kill it on shutdown rather than keeping the main JVM thread and main Python thread the same. This may have interactions with the shutdown sequence. For symmetry we would want to use this same path for windows, osx, and linux otherwise we may end up with different shutdown behavior on different systems.

This fix would require porting enough os specific threading libraries so that we can create a thread, signal it to terminate, and join it after. We also may need to add a "gui" option which informs JPype that it should also launch an appkit thread. Unfortunately, I can't really complete this work until I am able to replicate it locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unable to deliver desired behavior (crash, fail, untested) on-hold Problem requiring further user input to address
Projects
None yet
Development

No branches or pull requests

3 participants