Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file system change notifier #783

Closed
wants to merge 6 commits into from

Conversation

davidbrochart
Copy link
Contributor

I think we should adhere to watchfiles's API.

Closes #778.

@blink1073
Copy link
Contributor

I verified that watchfiles is available on conda-forge and has an MIT license.

@codecov-commenter
Copy link

codecov-commenter commented Apr 8, 2022

Codecov Report

Merging #783 (029de4f) into main (02e5f3a) will decrease coverage by 0.37%.
The diff coverage is 87.50%.

❗ Current head 029de4f differs from pull request most recent head 7bd9f99. Consider uploading reports for the commit 7bd9f99 to get more accurate results

@@            Coverage Diff             @@
##             main     #783      +/-   ##
==========================================
- Coverage   70.29%   69.91%   -0.38%     
==========================================
  Files          62       62              
  Lines        7554     7383     -171     
  Branches     1248     1225      -23     
==========================================
- Hits         5310     5162     -148     
+ Misses       1861     1848      -13     
+ Partials      383      373      -10     
Impacted Files Coverage Δ
jupyter_server/services/contents/manager.py 82.57% <75.00%> (-0.15%) ⬇️
jupyter_server/services/contents/filemanager.py 73.17% <100.00%> (+0.38%) ⬆️
jupyter_server/terminal/api_handlers.py 0.00% <0.00%> (-100.00%) ⬇️
jupyter_server/terminal/terminalmanager.py 0.00% <0.00%> (-88.47%) ⬇️
jupyter_server/terminal/__init__.py 0.00% <0.00%> (-79.17%) ⬇️
jupyter_server/terminal/handlers.py 0.00% <0.00%> (-75.00%) ⬇️
jupyter_server/services/kernels/handlers.py 57.44% <0.00%> (-1.07%) ⬇️
jupyter_server/utils.py 61.38% <0.00%> (-1.00%) ⬇️
jupyter_server/extension/manager.py 92.75% <0.00%> (-0.85%) ⬇️
jupyter_server/pytest_plugin.py 83.80% <0.00%> (-0.37%) ⬇️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02e5f3a...7bd9f99. Read the comment docs.

@davidbrochart
Copy link
Contributor Author

I verified that watchfiles is available on conda-forge

I packaged it yesterday 😄

@fcollonval
Copy link
Member

I was about to ask how it compares with watchgod - but actually that is the new name 😄

@davidbrochart
Copy link
Contributor Author

Indeed, but I think now almost everything is done in Rust using this library.

@davidbrochart davidbrochart force-pushed the watchfiles branch 3 times, most recently from b90053b to db610cd Compare April 8, 2022 13:14
@kevin-bates
Copy link
Member

I was about to ask how it compares with watchgod - but actually that is the new name

We use watchdog in EG and Elyra for cache consistency implementations. I'm guessing watchgod (watchfiles) came afterward (based on the play on the name), and I'm curious if there's something that should be avoided about watchdog? Since both EG and Elyra depend on jupyter_server, I'd rather not require our users to install two packages that essentially do the same thing. I'm fine converting the watchdog instances, but would like to better understand why watchfiles is advantageous.

I verified that watchfiles is available on conda-forge and has an MIT license.

Hmm. I see watchdog uses an Apache license. Is this difference a licensing thing? If so, are there also functionality differences that warrant the use of watchfiles over watchdog?

@blink1073
Copy link
Contributor

Apache is fine, I was mainly checking for not GPL

@davidbrochart
Copy link
Contributor Author

@kevin-bates I had a previous experience with watchdog, my conclusion was that it is much better to work with watchgod (and thus watchfiles) in an async context. Also, watchfiles has some nice features such as grouping events, I'm not sure if watchdog has the equivalent.

@davidbrochart
Copy link
Contributor Author

davidbrochart commented Apr 8, 2022

The question here is: if we are importing watchfiles (in the Python meaning), does it mean we are also importing its API as a specification that other contents managers should be compliant with?

@davidbrochart
Copy link
Contributor Author

davidbrochart commented Apr 15, 2022

In JupyterLab I want to check if the contents manager supports watching file changes. For that I need to know if the underlying file system is the local file system:

if isinstance(self.contents_manager, (FileManagerMixin, AsyncFileManagerMixin)):

But this is not ideal since custom contents managers could target the local file system and not derive from FileManagerMixin or AsyncFileManagerMixin.
I think the ContentsManager class should have a watchfiles property returning None by default, that implementations should override if they can offer this service. For instance, our file managers would have:

@property
def watchfiles(self):
    import watchfiles
    return watchfiles.__all__
    # ('watch', 'awatch', 'run_process', 'arun_process', 'Change', 'BaseFilter', 'DefaultFilter', 'PythonFilter', 'VERSION')

Then, the user could inspect the objects returned by this property to know what is supported.
This makes the file system change notifyer a "loose specification" in the contents manager, meaning that it tries to follow watchfiles' API when possible.
An alternative solution would be to fully specify the API in the contents manager, maybe by copying a subset of watchfiles' API, but it seems less flexible. What do you think?

@davidbrochart
Copy link
Contributor Author

In this last commit I added the watchfiles attribute in the contents manager class. In ContentsManager it is implemented as an instance of an empty class, and in FileContentsManager and AsyncFileContentsManager it is the watchfiles module.

@martinRenou
Copy link
Contributor

In this last commit I added the watchfiles attribute in the contents manager class

I am wondering what does that mean in terms of jupyter-server semantic versioning. I guess it means jupyter-server will have to follow watchfiles if it adopts its API? I suppose we don't want to be in a situation where watchfile don't follow semver and breaks on a patch release?

If the watchfiles API is not too complicated, could the ContentManager just implement the methods it provides and privately make use of watchfiles?

@davidbrochart
Copy link
Contributor Author

Yes that's what I meant with "loose specification". On the other hand, even if we implement the methods in ContentsManager and under the hood delegate to watchfiles, we still have no control over changes e.g. in the return values, etc. Or we would have to exactly pin the package, and manually sync with new releases. But maybe it is safer?

@@ -34,6 +34,13 @@
copy_pat = re.compile(r"\-Copy\d*\.")


class NoWatchfilesAPI:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think NoWatchfilesAPI should contain the desired APIs without concrete implementations. So it can be used as a type hint for the watchfiles property of ContentsManager

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes so that would go in Martin's direction.

@kevin-bates
Copy link
Member

Thanks for working on this @davidbrochart. I agree with @martinRenou and believe we should introduce some form of abstraction that allows folks the ability to implement their own "watchfiles" integration (in their own ContentsManager). We should also not preclude custom contents managers from wanting to watch remote files either - so filesystem locality should not be relevant - and boils down to does the configured ContentsManager support file-change notification or not?

@davidbrochart
Copy link
Contributor Author

@kevin-bates @martinRenou @trungleduc I added a BaseWatchfiles class, that is currently populated with the following attributes (links to the corresponding watchfiles API):

I believe this is the minimum necessary for now (at least for my use-case 😄 ). We can add more in the future if needed. I also pinned watchfiles==0.13, so that this is a stable API in our contents manager implementations.

@kevin-bates
Copy link
Member

Isn't this just exposing the watchfiles API? I think we need to ask what we need as a notification item and expose that. This is requiring that all implementations implement watchfiles.

What do we need in addition to the file and its corresponding action: added, modified, or deleted?

Are we going to allow subsets of src_root to be monitored or is it all or nothing under src_root? Perhaps we could have a configurable list that specifies the directories to be monitored? And can the kinds of files be specified - *.ipynb, *.py, etc. relative to each monitored directory? If so, we could include the filtering expression in the directory:

c.ContentsManager.watched_directories = ["foo/bar/{*.ipynb, *.py}", "baz/{*.yaml}"]

or, if only src_root, just let the configurable be the filter...

c.ContentsManager.watched_files = ["*.ipynb", "*.py", "*.yaml"]

and non-None values indicate that "watch files" is enabled.

@davidbrochart
Copy link
Contributor Author

Isn't this just exposing the watchfiles API? I think we need to ask what we need as a notification item and expose that. This is requiring that all implementations implement watchfiles.

I think we should adhere to watchfiles' API, but that doesn't mean that all implementations must implement it. Our file contents managers implement all of it, and directly delegates to watchfiles. But other implementations can implement a subset of it. But what they do implement, should conform to the API.

What do we need in addition to the file and its corresponding action: added, modified, or deleted?

Are we going to allow subsets of src_root to be monitored or is it all or nothing under src_root? Perhaps we could have a configurable list that specifies the directories to be monitored? And can the kinds of files be specified - *.ipynb, *.py, etc. relative to each monitored directory? If so, we could include the filtering expression in the directory:

c.ContentsManager.watched_directories = ["foo/bar/{*.ipynb, *.py}", "baz/{*.yaml}"]

or, if only src_root, just let the configurable be the filter...

c.ContentsManager.watched_files = ["*.ipynb", "*.py", "*.yaml"]

and non-None values indicate that "watch files" is enabled.

I think all your suggestions can be implemented with the watch_filter argument of watch().
That is why I think we should adhere to watchfiles' API, there is no need to reinvent the wheel. It is a very well designed library and I doubt we would come up with a better API. And anyway, if we want to use it under the hood at least for our file contents managers, having another layer on top of its original API will probably make things more complicated, don't you think?

@kevin-bates
Copy link
Member

The idea would be to constrain what is exposed via the API, otherwise, folks that don't use watchfiles within their implementation will need to figure out how to implement aspects of watchfiles that don't exist in other offerings because our users will have grown accustomed to and, worse, expect the watchfiles functionality if its exposed in the raw. This is why I asked what we minimally need and only expose that as the "API". We can use whatever functionality we want internally.

@davidbrochart
Copy link
Contributor Author

I get your point, on the other hand it would be nice to offer various level of functionalities depending on the file system. On a local file system, I would expect to be able to use all of watchfiles' functionalities, and on a less capable file system, less functionalities. Maybe we're only talking about being able to watch file changes vs not being able to (and actually I need that in JupyterLab), maybe it's more subtle, like supporting the debounce parameter of watch() (which affects how changes are grouped).
That raises the question of letting the user discover what functionalities are supported. If no functionality at all, calling watch() currently raises an exception, so that's one way. For more subtle functionalities, I was thinking that the user could inspect the signature of watch(), and see e.g. if debounce is present, but I agree that it's not ideal. Maybe we could have a method that would return the list of supported arguments?
Or maybe, as you suggested, we should support a fixed set of functionalities, that would be the minimum that could be supported by all file systems. Thanks for you feedback Kevin!

@blink1073 blink1073 changed the title Add file system change notifyer Add file system change notifier May 6, 2022
@Zsailer
Copy link
Member

Zsailer commented Feb 16, 2023

We spent some time triaging PRs in the Server Meeting today. What's the current status of this PR?

It seems to me that it might be better to offer this as an alternative implementation of the FileContentsManager that could be provided as a separate package/extension. Imposing a "file-watching" cost on all users can be quite expensive, so making it an opt-in feature might be better.

@davidbrochart
Copy link
Contributor Author

TO;DR
(too old; don't remember 😄)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Contents change notifier
8 participants