Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8293067: (fs) Implement WatchService using system library (macOS) #10140

Closed
wants to merge 15 commits into from

Conversation

mkartashev
Copy link
Member

@mkartashev mkartashev commented Sep 2, 2022

This is an implementation of WatchService based on File System Events API that is capable of generating events whenever a change occurs in an interesting directory or underneath it. Since the API naturally supports "recursive" watch, the FILE_TREE is supported by the watch service.

Some things of note:

  • There's one "service" thread per WatchService instance that is inactive unless changes occur in the watched directory. The changes are grouped by introducing a time delay between when they occurred and when they are reported, which is controlled by the sensitivity modifier of the watch service.
  • Since FSEvents API reports directories only, the watch service keeps a snapshot (hierarchical if necessary) of the files in the directory being watched. The snapshot gets updated when an event in that directory or underneath it gets delivered. File changes are detected by comparing "last modified" time with a millisecond precision (BasicFileAttributes.lastModifiedTime()).
  • There is a slight complication with the move of an entire directory hierarchy: FSEvents API only reports about the containing directory of that move and not about any of the directories actually moved. There's a separate test for that (Move.java).
  • The code is careful not to do any I/O (such as reading the contents of a directory or attributes of a file) unless unavoidable. Any deviation from this line should be considered a bug (of, arguably, low priority).
  • The native part consists mostly of API wrappers with one exception of the callback function invoked by the system to report the events that occurred. The sole task of the function is to convert from C strings to Java strings and pass the array of affected directories to Java code. This can be rewritten if desired to make the code more future-proof.

This commit leaves PollingWatchService unused. I'm not sure if I should/can do anything about it. Any advice is welcomed.

Testing

  • Tested by running test/jdk/java/nio/file and test/jdk/jdk/nio on MacOS 10.15.7 (x64) and test/jdk/java/nio/ plus test/jdk/jdk/nio on MacOS 12.5.1 (aarch64).
  • Also verified that new tests pass on Linux and Windows.
  • This code (albeit in a slightly modified form) has been in use at JetBrains for around half a year and a few bugs have been found and fixed during that time period.

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires CSR request JDK-8296164 to be approved

Issues

  • JDK-8293067: (fs) Implement WatchService using system library (macOS)
  • JDK-8296164: (fs) introduce system property to choose WatchService on macOS (CSR)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/10140/head:pull/10140
$ git checkout pull/10140

Update a local copy of the PR:
$ git checkout pull/10140
$ git pull https://git.openjdk.org/jdk pull/10140/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 10140

View PR using the GUI difftool:
$ git pr show -t 10140

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10140.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 2, 2022

👋 Welcome back mkartashev! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 2, 2022
@openjdk
Copy link

openjdk bot commented Sep 2, 2022

@mkartashev The following label will be automatically applied to this pull request:

  • nio

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Sep 2, 2022

@AlanBateman
Copy link
Contributor

Thanks for this PR, I plan to review this but it will take some time. Just to set expectations that the patch will likely through through several iterations as there will be many issues to discuss.

@AlanBateman
Copy link
Contributor

This commit leaves PollingWatchService unused. I'm not sure if I should/can do anything about it. Any advice is welcomed.

I think it is still used on on AIX but there is someone from IBM working on that. It may be that we will have to keep it for new ports. For macOS then we might have to introduce a system property to use it, at least until there is confidence with the new implementation.

@mkartashev
Copy link
Member Author

For macOS then we might have to introduce a system property to use it, at least until there is confidence with the new implementation.

I actually had that property, but pulled it out before submitting the pull request, so it's going to be very easy to re-introduce it.

@bplb
Copy link
Member

bplb commented Sep 7, 2022

In our CI system test/jdk/java/nio/file/WatchService/Move.java appears to time out consistently, although I have not seen it happen on my local machine.

@mkartashev
Copy link
Member Author

In our CI system test/jdk/java/nio/file/WatchService/Move.java appears to time out consistently, although I have not seen it happen on my local machine.

Can you provide the .jtr file, please? I haven't seen the test hang, but by inserting different delays different systems can expose synchronization issues, of course.

@bplb
Copy link
Member

bplb commented Sep 8, 2022

Can you provide the .jtr file, please? I haven't seen the test hang, but by inserting different delays different systems can expose synchronization issues, of course.

I can't give the whole .jtr file due to internal content, but here are some excerpts.

/System/Volumes/Data/mesos/work_dir/jib-master/install/2022-09-07-1757010.brian.burkhalter.jdk/src.full/open/test/jdk/java/nio/file/WatchService/Move.java:39: warning: ExtendedWatchEventModifier is internal proprietary API and may be removed in a future release
import com.sun.nio.file.ExtendedWatchEventModifier;
                       ^
/System/Volumes/Data/mesos/work_dir/jib-master/install/2022-09-07-1757010.brian.burkhalter.jdk/src.full/open/test/jdk/java/nio/file/WatchService/Move.java:110: warning: ExtendedWatchEventModifier is internal proprietary API and may be removed in a future release
                    new WatchEvent.Kind<?>[]{ ENTRY_CREATE, ENTRY_DELETE },  ExtendedWatchEventModifier.FILE_TREE);
                                                                             ^
2 warnings
result: Passed. Compilation successful
execStatus=Error. Agent error\: java.lang.Exception\: Agent 5 timed out with a timeout of 480 seconds; check console log for any additional details
test result: Error. Agent error: java.lang.Exception: Agent 5 timed out with a timeout of 480 seconds; check console log for any additional details

The node in question is a MacPro6_1 running Mac_OS_X_12.4 on a Intel_R__Xeon_R__CPU_E5-1620_v2___3.70GHz with total memory 17179869184 (16.00 GB).

Sadly, with timeouts, there is often little information to go on.

@mlbridge
Copy link

mlbridge bot commented Sep 9, 2022

Mailing list message from Michael Hall on nio-dev:

For me.

move subtree elapsed 1035
move file elapsed 2018

Elapsed times in millis

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/nio-dev/attachments/20220908/7459dacf/attachment.htm>

@AlanBateman
Copy link
Contributor

I did a first pass over this. It's a good start and demonstrates that the file system events / FSEventStream* API can be used to implement WatchService.

It's really unfortunate that it requires calling CFRunLoopRun to do the run loop as that means an upcall and JNI code that we would normally try to keep out of this area. One concern is that there is a lot of UnixPath (bytes) <-> String <-> CFString conversions going on and these will need time to work through (even with UTF-8). Also
CFRunLoopThread will need to be an InnocuousThread to avoid inheriting inheritable thread locals or context class loader.

One architectural issue is that the WatchService implementation should not be using Path.of or Files.* methods as that will cause a loop when interposing on the default provider. The provider implementation should only use use the Unix provider methods directly.

The use of toRealPath().normalize() looks odd, I would expect the normalize to be a no-op here. This is an area that will require careful study as it's just not clear how the macOS work when the sym link is to another directory on another file system.

Look at ptr_to_jlong and jlong_to_jptr to go between points and jlong and that should eliminate some of the casts.

The code includes SUBTREE support. I had hope this wouldn't be included initially as it adds a bit more complexity and could be added later.

Style-wise the code is very different to the existing code and I think we'll need to do a few cleanup passes (overly long lines, naming, formatting, overdue of final, ... all minor stuff and there is a lot to go through before these things).

@mkartashev
Copy link
Member Author

@bplb Thanks for the log. It's really strange, though, that there is nothing from System.out because the test itself logs intensively even before any WatchService code is executed. Also, AFAIK timeouts trigger thread dump, which would be quite helpful. Are you not at liberty to share those as well?

@mlbridge
Copy link

mlbridge bot commented Sep 12, 2022

Mailing list message from Michael Hall on nio-dev:

Excuse my unofficial interest. But I had attempted a kqueue based OS/X watch service sometime back that failed, so it?s interesting to see one working.
Mine would stop receiving kernel events running the LotsOfEvents test. It never got past that.
Checking LotsOfEvents out of curiosity I notice it?s millis timing is?
overflow elapsed 32200
queuing elapsed 24292

While again the Move test runs in like?
move subtree elapsed 1048
move file elapsed 2018

So why would tests consistently time out in Move runs when LotsOfEvents takes longer for me? I am of course unfamiliar with the JDK testing but this seems odd.

@mkartashev
Copy link
Member Author

@AlanBateman
Thank you for taking time to review this, much appreciated. I made some changes based on your feedback; see inlined comments below.

It's really unfortunate that it requires calling CFRunLoopRun to do the run loop as that means an upcall and JNI code that we would normally try to keep out of this area. One concern is that there is a lot of UnixPath (bytes) <-> String <-> CFString conversions going on and these will need time to work through (even with UTF-8). Also CFRunLoopThread will need to be an InnocuousThread to avoid inheriting inheritable thread locals or context class loader.

Done, CFRunLoopThread is now InnocuousThread.

One architectural issue is that the WatchService implementation should not be using Path.of or Files.* methods as that will cause a loop when interposing on the default provider. The provider implementation should only use use the Unix provider methods directly.

Duly noted and, I think, corrected.

The use of toRealPath().normalize() looks odd, I would expect the normalize to be a no-op here. This is an area that will require careful study as it's just not clear how the macOS work when the sym link is to another directory on another file system.

I may be misinterpreting you here, but to me it seems toRealPath() is actually necessary. Let's say the watch root is a directory named "../symlink" that points to "/Users/maxim/work/tests/dir-watcher/". The FSEvents API will report strings denoting absolute path names like "/Users/maxim/work/tests/dir-watcher/subdir" and in order to be able to remove the common prefix (watch root), one has to make that watch root into an unambiguous absolute path name.

Look at ptr_to_jlong and jlong_to_jptr to go between points and jlong and that should eliminate some of the casts.

All such cases were eliminated.

The code includes SUBTREE support. I had hope this wouldn't be included initially as it adds a bit more complexity and could be added later.

The recursive subdirectory watch support was removed (for now); let's focus on the initial support for FSEvents.

Style-wise the code is very different to the existing code and I think we'll need to do a few cleanup passes (overly long lines, naming, formatting, overdue of final, ... all minor stuff and there is a lot to go through before these things).

I also reformatted the native code and a bit of Java mostly to shorten the lines. The rest will have to be on case-by-case basis, I'm afraid.

This time around I only ran the test/jdk/java/nio/file/WatchService tests on MacOs 10.15.

@bplb
Copy link
Member

bplb commented Sep 13, 2022

@bplb Thanks for the log. It's really strange, though, that there is nothing from System.out because the test itself logs intensively even before any WatchService code is executed. Also, AFAIK timeouts trigger thread dump, which would be quite helpful. Are you not at liberty to share those as well?

@mkartashev When a test times out, often not a lot of information is captured, including System.out and thread dumps. It can be frustrating to debug.

@bplb
Copy link
Member

bplb commented Sep 13, 2022

I reran the java/nio/file/WatchService tests with the current code and there were no failures this time. I wonder whether this might be due to the change from CFRunLoopThread to InnocuousThread? I had some similar timeouts with the kqueue-based prototype I developed which I think were due to problems with thread cleanup.

@mkartashev
Copy link
Member Author

I reran the java/nio/file/WatchService tests with the current code and there were no failures this time. I wonder whether this might be due to the change from CFRunLoopThread to InnocuousThread?

Thanks! Let's hope it stays this way.

BTW, my initial version did have a thread cleanup problem that was caught by other tests. That was because I assumed the run loop would exit as soon as all input sources are pulled from it; that assumption proved to be false, so now there's MacOSXWatchService.runLoopStop().

@mlbridge
Copy link

mlbridge bot commented Sep 14, 2022

Mailing list message from Brian Burkhalter on nio-dev:

On Sep 12, 2022, at 4:24 AM, Michael Hall <mik3hall at gmail.com<mailto:mik3hall at gmail.com>> wrote:

I had attempted a kqueue based OS/X watch service sometime back that failed, so it?s interesting to see one working.

Concerning kqueue, please see my comment here:

https://bugs.openjdk.org/browse/JDK-7133447?focusedCommentId=14522734&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14522734

Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/nio-dev/attachments/20220913/cb8ea62a/attachment.htm>

@mlbridge
Copy link

mlbridge bot commented Sep 14, 2022

Mailing list message from Michael Hall on nio-dev:

On Sep 13, 2022, at 11:46 AM, Brian Burkhalter <brian.burkhalter at oracle.com> wrote:

On Sep 12, 2022, at 4:24 AM, Michael Hall <mik3hall at gmail.com <mailto:mik3hall at gmail.com>> wrote:

I had attempted a kqueue based OS/X watch service sometime back that failed, so it?s interesting to see one working.

Concerning kqueue, please see my comment here:

https://bugs.openjdk.org/browse/JDK-7133447?focusedCommentId=14522734&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14522734 <https://bugs.openjdk.org/browse/JDK-7133447?focusedCommentId=14522734&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14522734>

Brian

I suppose it could of been number of file descriptors. Do you remember specific problems with the LotsOfEvents test? I remember the java would block waiting for something to happen. The kqueue native thread would loop on it?s on thread waiting for events but the kernel would just stop sending them.

It?s been a while since I looked at the code so I don?t remember if I had any cleanup that would release file descriptors. I probably should have. But I don?t remember on what conditions it ran if I did.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/nio-dev/attachments/20220913/b8982817/attachment.htm>

* Invoked on the CFRunLoopThread by the native code to report directories
* that need to be re-scanned.
*/
private void callback(final long eventStreamRef,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be named something more descriptive than callback, perhaps even handleEvents even though there is a method of the same name defined in MacOSXWatchKey?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.


if (!relativeRootPath.equals(path)) {
// Ignore events from subdirectories for now.
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should a line

eventFlagsPtr += SIZEOF_FS_EVENT_STREAM_EVENT_FLAGS;

be added here? (The definition of SIZEOF_FS_EVENT_STREAM_EVENT_FLAGS would have to be moved up.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, most certainly. Thanks for catching this!

@mkartashev
Copy link
Member Author

What I meant in my earlier comment was the tests failed with the same errors/timeouts as previously. There was no InternalError thrown.

Thanks! No luck, then...

@jaikiran
Copy link
Member

What I meant in my earlier comment was the tests failed with the same errors/timeouts as previously. There was no InternalError thrown.

Thanks! No luck, then...

Would using the non-deprecated API be something worth investigating? In the meantime, I will try and find some more time to see if we can narrow down these failures on the systems where this is failing. I don't know how soon I can get to that.

@mlbridge
Copy link

mlbridge bot commented Nov 30, 2022

Mailing list message from Michael Hall on nio-dev:

On Nov 29, 2022, at 1:15 AM, Jaikiran Pai <jpai at openjdk.org> wrote:

That isn't too surprising because I use the same infrastructure as Alan to test this.

I assume this infrastructure is not generally available?
Not being able to reproduce the errors could make fixing very difficult.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/nio-dev/attachments/20221130/b97a5990/attachment-0001.htm>

@mlbridge
Copy link

mlbridge bot commented Nov 30, 2022

Mailing list message from Michael Hall on nio-dev:

On Nov 13, 2022, at 9:31 AM, Alan Bateman <alanb at openjdk.org> wrote:

Unfortunately there is a lot of timeouts and intermittent failures and across quite a range of macOS releases (from 10.15 to 12.2).

Are there any errors after 12.2. I am 12.6. I am not sure what Maxim is. Is it possible it was an Apple issue that is fixed in recent releases?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/nio-dev/attachments/20221130/aa4eaa0a/attachment.htm>

@mlbridge
Copy link

mlbridge bot commented Nov 30, 2022

Mailing list message from Maxim Kartashev on nio-dev:

FWIW I'm on 10.15

On Wed, Nov 30, 2022 at 4:31 PM Michael Hall <mik3hall at gmail.com> wrote:

On Nov 13, 2022, at 9:31 AM, Alan Bateman <alanb at openjdk.org> wrote:

Unfortunately there is a lot of timeouts and intermittent failures and
across quite a range of macOS releases (from 10.15 to 12.2).

Are there any errors after 12.2. I am 12.6. I am not sure what Maxim is.
Is it possible it was an Apple issue that is fixed in recent releases?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/nio-dev/attachments/20221130/0c1b8e78/attachment.htm>

@mkartashev
Copy link
Member Author

Would using the non-deprecated API be something worth investigating?

That's on my to-do list.

In the meantime, I will try and find some more time to see if we can narrow down these failures on the systems where this is failing. I don't know how soon I can get to that.

Thanks anyway, much appreciated!

@mlbridge
Copy link

mlbridge bot commented Dec 1, 2022

Mailing list message from Michael Hall on nio-dev:

On Nov 30, 2022, at 7:59 AM, Maxim Kartashev <maxim.kartashev at jetbrains.com> wrote:

FWIW I'm on 10.15

On Wed, Nov 30, 2022 at 4:31 PM Michael Hall <mik3hall at gmail.com <mailto:mik3hall at gmail.com>> wrote:

On Nov 13, 2022, at 9:31 AM, Alan Bateman <alanb at openjdk.org <mailto:alanb at openjdk.org>> wrote:

Unfortunately there is a lot of timeouts and intermittent failures and across quite a range of macOS releases (from 10.15 to 12.2).

Are there any errors after 12.2. I am 12.6. I am not sure what Maxim is. Is it possible it was an Apple issue that is fixed in recent releases?

I guess we?re lucky or there?s something different about the failing infrastructure.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/nio-dev/attachments/20221130/3d40ae2a/attachment.htm>

@mlbridge
Copy link

mlbridge bot commented Dec 7, 2022

Mailing list message from Michael Hall on nio-dev:

On Nov 30, 2022, at 11:09 AM, Michael Hall <mik3hall at gmail.com> wrote:

On Nov 30, 2022, at 7:59 AM, Maxim Kartashev <maxim.kartashev at jetbrains.com <mailto:maxim.kartashev at jetbrains.com>> wrote:

FWIW I'm on 10.15

On Wed, Nov 30, 2022 at 4:31 PM Michael Hall <mik3hall at gmail.com <mailto:mik3hall at gmail.com>> wrote:

On Nov 13, 2022, at 9:31 AM, Alan Bateman <alanb at openjdk.org <mailto:alanb at openjdk.org>> wrote:

Unfortunately there is a lot of timeouts and intermittent failures and across quite a range of macOS releases (from 10.15 to 12.2).

Are there any errors after 12.2. I am 12.6. I am not sure what Maxim is. Is it possible it was an Apple issue that is fixed in recent releases?

I guess we?re lucky or there?s something different about the failing infrastructure.

Another, maybe remote, possibility - if it?s not the testing framework, or a fixed Apple bug, possibly it?s some quirk in different Xcode versions being used to build the jdk?

https://git.openjdk.org/jdk/pull/11115 <https://git.openjdk.org/jdk/pull/11115>
Appears to correct the Xcode build issues I ran into earlier for Xcode 14.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/nio-dev/attachments/20221207/541413b1/attachment-0001.htm>

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 4, 2023

@mkartashev This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 1, 2023

@mkartashev This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Feb 1, 2023
@mlbridge
Copy link

mlbridge bot commented Feb 2, 2023

Mailing list message from Michael Hall on nio-dev:

Is there any chance this will be made available in some other way?

@mkartashev
Copy link
Member Author

mkartashev commented Feb 3, 2023

Is there any chance this will be made available in some other way?

JetBrains Runtime has FSEvents-based implementation of WatchService on macOS.

@AlanBateman
Copy link
Contributor

JetBrains Runtime has FSEvents-based implementation of WatchService on macOS.

Is this similar to what was proposed here, I'm curious if the same issues were encountered when running the tests across a wider range of macOS releases.

@mkartashev
Copy link
Member Author

@AlanBateman It is the implementation this one was based upon. No known bugs exist there at this moment in time (tests were run on IntelliJ infrastructure, of course).

@bplb
Copy link
Member

bplb commented Feb 3, 2023

@mkartashev Do you intend to continue work on this PR? If not, then we could potentially take over the work if that is acceptable.

@mlbridge
Copy link

mlbridge bot commented Feb 3, 2023

Mailing list message from Michael Hall on nio-dev:

Thanks,
I have code that mostly passes through to the default OS/X file system provider but includes some extensions to expose some of the OS/X native api?s.
Not getting a lot of use I know of, but I run some other code off of it myself.
It lacks a WatchService ,my own attempt at a kqueue one for that coming up short.
So no actual native WatchService, which seemed a shame.
I may try to make use of yours.

@bplb
Copy link
Member

bplb commented Feb 3, 2023

It lacks a WatchService ,my own attempt at a kqueue one for that coming up short.

As was previously discussed (I think) I don't believe that a kqueue implementation is feasible as it will eventually run out of file descriptors. This is specifically due to MODIFY which cannot be detected without having an open file descriptor for each monitored file.

@mkartashev
Copy link
Member Author

@mkartashev Do you intend to continue work on this PR? If not, then we could potentially take over the work if that is acceptable.

I plan to have a shot at using newer (not declared deprecated) API for scheduling the event stream. If that fails, this implementation is up for grabs for all I care.

@mlbridge
Copy link

mlbridge bot commented Feb 4, 2023

Mailing list message from Michael Hall on nio-dev:

This was some time ago and I have no current intention to continue work on that although I?m not sure I ever removed the code from the project.

@mlbridge
Copy link

mlbridge bot commented Feb 4, 2023

Mailing list message from Michael Hall on nio-dev:

As I remember working on this was difficult because Maxim and I weren?t able to reproduce whatever errors the jdk testing framework was getting.

If you say you are done, and the PR has been closed by openjdk, I?ll grab a copy. It would probably be easier than trying to extract it from the GitHub project you posted.

How will your changes be tested if the PR has been closed?

@mkartashev
Copy link
Member Author

How will your changes be tested if the PR has been closed?

I guess I'll re-open it, perhaps as a new one if necessary. As you correctly noted, I can't test this fully by myself so I have to make the code available to others somehow.

@bplb
Copy link
Member

bplb commented Feb 9, 2023

[...] I can't test this fully by myself so I have to make the code available to others somehow.

I think that we can help with testing.

@mlbridge
Copy link

mlbridge bot commented Feb 16, 2023

Mailing list message from Michael Hall on nio-dev:

I decided to try and do a standalone version along with my OS/X default FileSystemProvider project.

Just finished throwing something together for that.

I based it on the JetBrains runtime since the ownership status here seems still sort of undetermined.

If this isn?t the place to discuss this let me know. If that is the case, would you be interested at all in discussing it off-list @Maxim?

My first real test with no debugging is getting?

Exception in thread "FileSystemWatcher" Exception in thread "FileSystemWatcher" java.lang.InternalError: platform encoding not initialized
at us.hall.trz.osx.MacOSXWatchService.CFRunLoopRun(Native Method)
at us.hall.trz.osx.MacOSXWatchService$CFRunLoopThread.run(MacOSXWatchService.java:201)

Googling the error turns up occurrences but nothing I?m really understanding as connecting to what I?m doing.

Any thoughts on this?

@mlbridge
Copy link

mlbridge bot commented Feb 16, 2023

Mailing list message from Michael Hall on nio-dev:

On Feb 15, 2023, at 5:36 PM, Michael Hall <mik3hall at gmail.com> wrote:

Exception in thread "FileSystemWatcher" Exception in thread "FileSystemWatcher" java.lang.InternalError: platform encoding not initialized

I had to add a JNI_OnLoad to resolve jvm references which seemed to work. Initializing the encoding from there also seems to work.
The code now in fact does seem to work. Not a lot of debugging required.
I?ll try to get my GitHub project updated with maybe improved markdown.
But for me, now with mine, this seems to work just fine.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/nio-dev/attachments/20230216/944c8d2b/attachment.htm>

@mlbridge
Copy link

mlbridge bot commented Feb 18, 2023

Mailing list message from Michael Hall on nio-dev:

On Feb 16, 2023, at 11:54 AM, Michael Hall <mik3hall at gmail.com> wrote:

On Feb 15, 2023, at 5:36 PM, Michael Hall <mik3hall at gmail.com <mailto:mik3hall at gmail.com>> wrote:

Exception in thread "FileSystemWatcher" Exception in thread "FileSystemWatcher" java.lang.InternalError: platform encoding not initialized

I had to add a JNI_OnLoad to resolve jvm references which seemed to work. Initializing the encoding from there also seems to work.
The code now in fact does seem to work. Not a lot of debugging required.
I?ll try to get my GitHub project updated with maybe improved markdown.
But for me, now with mine, this seems to work just fine.

Still a little thrown together but I have updated my GitHub project for this.

https://github.com/mik3hall/trz <https://github.com/mik3hall/trz>

My original code is a custom default FileSystemProvider. It is almost all pass through except I added some OS/X specific FileAttributeView?s for file related native api's. For your code this means you are basically a plugin WatchService replacing the default platform polling one for whatever jdk you choose to use it on. No other actual changes to the platform provider are involved.
This jdk needs to include the module jdk.incubator.foreign. I used MemoryAddress to replace the use of Unsafe getInt. A possible way around this, reverting my changes, is mentioned in the README. As are other details of what I remembered needed changing to go from a runtime based implementation to a jni one.
I also detail some of what the original code did.

I have test now with the nio Move, Basic, and a modified extra heavy duty LotsOfEvents. It seems to work fine. Surprisingly easily.

So, this should mean you or anyone else who wants to can try this out against any recent openjdk version.

If anyone does, I would of course be interested in hearing about it. Especially if there are any problems related to what I?ve done.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/nio-dev/attachments/20230218/85dc1e6b/attachment.htm>

@mlbridge
Copy link

mlbridge bot commented Feb 18, 2023

Mailing list message from Michael Hall on nio-dev:

On Feb 18, 2023, at 8:54 AM, Michael Hall <mik3hall at gmail.com> wrote:

On Feb 16, 2023, at 11:54 AM, Michael Hall <mik3hall at gmail.com <mailto:mik3hall at gmail.com>> wrote:

On Feb 15, 2023, at 5:36 PM, Michael Hall <mik3hall at gmail.com <mailto:mik3hall at gmail.com>> wrote:

Exception in thread "FileSystemWatcher" Exception in thread "FileSystemWatcher" java.lang.InternalError: platform encoding not initialized

I had to add a JNI_OnLoad to resolve jvm references which seemed to work. Initializing the encoding from there also seems to work.
The code now in fact does seem to work. Not a lot of debugging required.
I?ll try to get my GitHub project updated with maybe improved markdown.
But for me, now with mine, this seems to work just fine.

Still a little thrown together but I have updated my GitHub project for this.

https://github.com/mik3hall/trz <https://github.com/mik3hall/trz>

My original code is a custom default FileSystemProvider. It is almost all pass through except I added some OS/X specific FileAttributeView?s for file related native api's. For your code this means you are basically a plugin WatchService replacing the default platform polling one for whatever jdk you choose to use it on. No other actual changes to the platform provider are involved.
This jdk needs to include the module jdk.incubator.foreign. I used MemoryAddress to replace the use of Unsafe getInt. A possible way around this, reverting my changes, is mentioned in the README. As are other details of what I remembered needed changing to go from a runtime based implementation to a jni one.
I also detail some of what the original code did.

I have test now with the nio Move, Basic, and a modified extra heavy duty LotsOfEvents. It seems to work fine. Surprisingly easily.

So, this should mean you or anyone else who wants to can try this out against any recent openjdk version.

If anyone does, I would of course be interested in hearing about it. Especially if there are any problems related to what I?ve done.

And to possibly make it a little easier for anyone who does want to try it I added a release for the first time.
This just contains the jar (non-modular), dylib, and a sample invocation.
All it takes.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/nio-dev/attachments/20230218/a78237e1/attachment-0001.htm>

@mkartashev
Copy link
Member Author

For the record: I finally got around to trying the new dispatch queues API. This reduces the complexity of the current implementation quite a bit and passes all tests except one. The obstacle I have not been able to overcome is this: the new implementation consistently fails the test test/jdk/java/nio/file/WatchService/LotsOfCancels.java, which creates many watch services and constantly starts/stops FSEventStream's.

I have not been able to find any solution to this; perhaps we are hitting some OS resource limit, even though not that many streams are active at any given moment (sometimes you can also see the "too many open files" exception from that test when it creates new temporary files).

So far it loos like this new API is no go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
csr Pull request needs approved CSR before integration nio [email protected]
Development

Successfully merging this pull request may close these issues.

6 participants