draft: File ID Manager base implementation #921

dlqqq · 2022-07-13T16:15:36Z

Overview

Draft implementation of File ID service proposed here.

File ID Service maintains a bi-directional mapping between file IDs and file names. This service permits bi-directional lookup, and preserves file ID across filesystem operations done through the ContentsManager.

Adds FileIdManager class with assoc. unit tests
Integrates FileIdManager methods into ContentsManager and AsyncContentsManager filesystem methods
Appends id property that contains the file ID to the Contents API GET/POST/PUT/PATCH /api/contents/{path} operations.
Adds benchmark under jupyter_server/benchmarks/fileidmanager_benchmark.py.

Testing

To test with JupyterLab stable, developers can run the following in a separate conda environment:

git clone [email protected]:jupyter-server/jupyter_server.git
cd jupyter_server/
git remote add dlqqq [email protected]:dlqqq/jupyter_server.git
git pull dlqqq file-id-service
pip install jupyterlab
pip install -e ".[dev,test]"

Benchmarks

To run File ID Manager benchmark:

python jupyter_server/benchmarks/fileidmanager_benchmark.py

Benchmarks run on m5.12xlarge AWS EC2 instance.

% python jupyter_server/benchmarks/fileidmanager_benchmark.py
Index benchmark (separate transactions)
100       files | 0.2932   s
1,000     files | 2.9968   s
Index benchmark (single transaction, atomic INSERTs)
100       files | 0.0032   s
1,000     files | 0.0072   s
10,000    files | 0.0362   s
100,000   files | 0.3430   s
1,000,000 files | 3.5570   s
Index benchmark (single transaction, batched INSERTs)
100       files | 0.2897   s
1,000     files | 0.2663   s
10,000    files | 0.2613   s
100,000   files | 0.2678   s
1,000,000 files | 2.7359   s
Recursive move benchmark
100       files | 0.0065   s
1,000     files | 0.0093   s
10,000    files | 0.0370   s
100,000   files | 0.3499   s
1,000,000 files | 3.6639   s
Recursive copy benchmark
100       files | 0.0106   s
1,000     files | 0.0121   s
10,000    files | 0.0233   s
100,000   files | 0.1632   s
1,000,000 files | 1.5505   s
Recursive delete benchmark
100       files | 0.0033   s
1,000     files | 0.0047   s
10,000    files | 0.0150   s
100,000   files | 0.1359   s
1,000,000 files | 1.4314   s

Zsailer · 2022-07-26T16:17:20Z

jupyter_server/services/contents/fileidmanager.py

+
+
+class FileIdManager(LoggingConfigurable):
+    db_path = Unicode(


I would recommend that we name this trait database_filepath to match the SessionManager:

jupyter_server/jupyter_server/services/sessions/sessionmanager.py

Lines 153 to 161 in e59610b

database_filepath = Unicode(

default_value=":memory:",

help=(

"The filesystem path to SQLite Database file "

"(e.g. /path/to/session_database.db). By default, the session "

"database is stored in-memory (i.e. `:memory:` setting from sqlite3) "

"and does not persist when the current Jupyter Server shuts down."

),

).tag(config=True)

There is also some logic here you could borrow to validate this trait:

jupyter_server/jupyter_server/services/sessions/sessionmanager.py

Lines 163 to 181 in e59610b

@validate("database_filepath")

def _validate_database_filepath(self, proposal):

value = proposal["value"]

if value == ":memory:":

return value

path = pathlib.Path(value)

if path.exists():

# Verify that the database path is not a directory.

if path.is_dir():

raise TraitError(

"`database_filepath` expected a file path, but the given path is a directory."

)

# Verify that database path is an SQLite 3 Database by checking its header.

with open(value, "rb") as f:

header = f.read(100)

if not header.startswith(b"SQLite format 3") and not header == b"":

raise TraitError("The given file is not an SQLite database file.")

return value

Zsailer · 2022-07-26T16:26:25Z

I think we could borrow quite a bit of SQLite connection logic in SessionManager here to avoid duplication of code. We'll need to abstract out some of these pieces and share them across classes.

For example, the definition of the connection, cursor, etc. could be abstracted out of the SessionManager and used by both of these managers:

jupyter_server/jupyter_server/services/sessions/sessionmanager.py

Lines 200 to 228 in e59610b

    
               @property 
        
               def cursor(self): 
        
                   """Start a cursor and create a database called 'session'""" 
        
                   if self._cursor is None: 
        
                       self._cursor = self.connection.cursor() 
        
                       self._cursor.execute( 
        
                           """CREATE TABLE IF NOT EXISTS session 
        
                           (session_id, path, name, type, kernel_id)""" 
        
                       ) 
        
                   return self._cursor 
        
               @property 
        
               def connection(self): 
        
                   """Start a database connection""" 
        
                   if self._connection is None: 
        
                       # Set isolation level to None to autocommit all changes to the database. 
        
                       self._connection = sqlite3.connect(self.database_filepath, isolation_level=None) 
        
                       self._connection.row_factory = sqlite3.Row 
        
                   return self._connection 
        
               def close(self): 
        
                   """Close the sqlite connection""" 
        
                   if self._cursor is not None: 
        
                       self._cursor.close() 
        
                       self._cursor = None 
        
               def __del__(self): 
        
                   """Close connection once SessionManager closes""" 
        
                   self.close()

jupyter_server/services/contents/fileidmanager.py

jupyter_server/serverapp.py

Zsailer · 2022-07-26T17:10:34Z

Thank you for working on this, @dlqqq! I think a File/Document ID will be really useful for a lot of use cases! I'm excited to see this land.

My number one concern about the current implementation is that the file ID isn't strictly unique—i.e. it's just an index in the database. In order to make this service more broadly useful, I think these IDs must be unique.

Let me give a simple example...

This service could be useful for sharing notebooks between two running Jupyter Servers (not necessarily RTC, just basic copying between two servers)—an enhancement many people have asked us for.

In this example, each server has its own file ID database that maps file IDs to their path on the individual filesystems. If we "share" this file from one server to another, I would hope we could use this service to have a (admittedly, weak) link between
these two copies of the document. In Jupyter Server today, we can't create this link, because using name/path wasn't a viable solution given the lack of uniqueness. In this implementation, we also can't create this link, because we can't guarantee uniqueness from these IDs between the two servers. They will likely have conflicting/unavailable indexes in their database since have completely different filesystem structures.

echarles · 2022-07-26T17:29:17Z

Thank you for working on this, @dlqqq!

Yes, thank you!

Let me give a simple example...

Sorry for jumping here without reading the complete history.... have the requirements for distributed systems/servers like jupyterhub be taken into account? (uniqueness, portability across servers...)

dlqqq · 2022-07-27T20:25:36Z

@Zsailer Hey Zach! Thank you for taking the time to review my progress so far. Right now, I'm working on implementing handling for out-of-band file system operations along with a design doc that dives deeper into how the implementation actually works along with some of its shortcomings.

I've addressed a few of your questions below and will address the others once I'm done with my current work. Keep in mind though, the logic for handling out-of-band operations gets a bit more tricky, so it'll likely need a re-review.

I think we could borrow quite a bit of SQLite connection logic in SessionManager here to avoid duplication of code. We'll need to abstract out some of these pieces and share them across classes.

Well, the SQL connection/statement logic is fairly simple and I'm not sure if it's worth abstracting. The SQL statements are all fairly self-contained and only take 2, maybe 3 lines per statement. Over-abstracting and splitting the source over too many different files hinders readability. But I'll take a closer look at this concern when I'm done with my current work.

My number one concern about the current implementation is that the file ID isn't strictly unique—i.e. it's just an index in the database. In order to make this service more broadly useful, I think these IDs must be unique.
...
This service could be useful for sharing notebooks between two running Jupyter Servers (not necessarily RTC, just basic copying between two servers)—an enhancement many people have asked us for.

I'm lost as to why a file ID needs to be, say, "globally unique" to support this use-case. This is just a file copy. If the receiving server needs data (e.g. comments) attached to that copy, the sending server sends that data in tandem. The receiving server is then free to assign a different ID to that new file.

IMO, there's no reason for these two servers to share a file ID for a given file on two separate filesystems because... well, they're not the same file. A copy of a file is a new file. This is how it's currently working in local filesystems. Circling back to commenting, if I add a comment in one file, I should not be adding a comment to another file just because it's a copy of the original and shares the same ID. Shared IDs across files leads to shared state across files, which gets complex really fast.

I'm mainly giving pushback here because using a UUID for the primary key is a performance detriment. I'm not giving a firm "no" here.

@echarles Great questions! Yes, these concerns will be taken into much further study, but only after we handle operations on local filesystems. Could you elaborate more on what you mean by "distributed [Jupyter] servers"? AFAIK JupyterHub is still a single server, so it seems more accurate to refer to it as "remote" rather than "distributed".

Regarding portability and uniqueness, could you elaborate more on these as well? @kevin-bates brought these points up too. Do you mean that file IDs should be globally unique as mentioned earlier? Again, I'm still a bit lost on the use case for this; to my knowledge Jupyter server only runs on a single machine, and a client can't connect to multiple Jupyter servers.

Rough sketch on how to handle remote Jservers/filesystems: The idea is that the ContentsManager implementation for remote Jservers/filesystems should expose methods like dir_exists(), file_exists(), stat(), etc. that the FileIdManager invokes via self.parent.X(). For local filesystems, self.parent refers to the default implementation for local filesystems, FileContentsManager.

echarles · 2022-07-28T05:44:18Z

(disclaimer: I have not taken the needed time to read and digest the dense and very useful discussion around the file id service, but think it is not too early to lay down any requirements)

Great questions! Yes, these concerns will be taken into much further study, but only after we handle operations on local filesystems.

Well, if the design and the technical implementation shows to be fairly different when we would look after, this may be annoying (e.g. if the generated database in case of many/distributed/remote servers is different form le local database). To which extends, not sure, one could argue we don't really care as the local database would be anyway not migrated to any such distributed system. But playing devil advocate, I could imagine my local server has a custom server extension that sends to a corporate backend those IDs and that one day my notebooks would have to be served from JupyterHub. The reverse is also true. A JupyterHub instance could allow the downoad of the notebook that could be run on my local laptop.

In short, I would prefer to cover the JupyterHub case in the design.

Could you elaborate more on what you mean by "distributed [Jupyter] servers"? AFAIK JupyterHub is still a single server, so it seems more accurate to refer to it as "remote" rather than "distributed".

I am thinking to a case where the notebook files are loaded from a shared storage system, e.g. NFS. In that case, the user goes today to server 1 instance, and tomorrow to server 2 instance.

Another case is RTC where a server hosted by JupyterHub will be consumed by multiple users.

So true, "distributed" is a bad name, IMHO a server is always "remote", "ephemeral" may be better but still not very good..

kevin-bates

Hi @dlqqq - thanks for getting this started. Here are some initial comments.

I think there are some things to work out - primarily uniqueness in both the id (UUID vs. Integer) and path (how to deal with different root_dir values). FWIW, I took a stab at converting the implementation to use UUIDs here.

Is it expected that things like "comments", or anything that references a File ID, be in this same database? If not, I think that argues for UUIDs as well - otherwise external references could get a false positive and be "linked" to files other than what is intended.

@vidartf has a good point of perhaps decoupling this from the ContentsManager implementation and using the event bus.

kevin-bates · 2022-08-01T21:32:12Z

jupyter_server/serverapp.py

@@ -1886,9 +1887,9 @@ def init_configurables(self):
            connection_dir=self.runtime_dir,
            kernel_spec_manager=self.kernel_spec_manager,
        )
+        self.file_id_manager = FileIdManager(parent=self, log=self.log)


We should probably define a file_id_manager_class configurable like the other managers define so folks can BYO their own FID manager. This also suggests defining an abstract base class that is then referenced in the klass attribute which allows folks to either derive from FileIdManager or essentially implement their own.

Good catch! I forgot to define FileIdManager as a trait on ServerApp. Before I fix this, is there any reason to prefer

self.file_id_manager_class = Type(klass=FileIdManager) ... self.file_id_manager = self.file_id_manager_class(*args)

over just using

self.file_id_manager = Instance(klass=FileIdManager, args=(*args))

I don't think either is what we want.

Both imply that one has to derive from FileIdManager - which we shouldn't impose on anyone. So klass should reference an ABC (e.g., FileIdManagerABC) that only defines the public APIs as abstract. It could also contain validation logic for the db-path trait, etc.

So the user would invoke the server using --ServerApp.file_id_manager_class = my.package.MyFileIdManager and the trait would ensure that my.package.MyFileIdManager is an instance of FileIdManagerABC and whatever interacts with the file ID manager class would be relegated to the methods defined in the ABC.

Folks that want to slightly adjust the OOTB implementation of FileIdManager (or override all of its methods entirely) are free to do so via subclassing as they are still an instance of the ABC.

For example, with the last set of changes, an entirely different implementation is required if any non-filesystem-based ContentsManager is configured (unfortunately).

jupyter_server/services/contents/fileidmanager.py

jupyter_server/services/contents/filemanager.py

jupyter_server/services/contents/manager.py

dlqqq · 2022-08-02T21:58:31Z

Hey @kevin-bates! Thanks for the review. You can make some excellent points. However, I literally was just about to push my latest changes on top of this PR. I'll do my best to address your comments soon.

To the rest of the team, I wanted to segment design discussion to a separate tracking issue, since this project will likely have to span multiple PRs due to its size. See here: #940

dlqqq · 2022-08-02T21:58:51Z

I'm pushing my latest changes (which track out-of-band filesystem operations) now.

EDIT: oh geez, the commit history is weirdly interweaved with review comments. yikes. LMK if you all want me to rebase and edit the commit dates.

kevin-bates · 2022-08-03T05:46:48Z

jupyter_server/services/contents/fileidmanager.py

+            # uniqueness constraint relaxed here because we need to keep records
+            # of deleted files which may occupy same path
+            "path TEXT NOT NULL, "
+            "ino INTEGER NOT NULL UNIQUE, "


Inode (st_ino) is not unique by itself and needs to be coupled with st_dev to gain uniqueness within a set of devices managed by a given server. I also think we will have issues when mounted file systems are in play (e.g., I think I recall that the inode reflects that of the NFS server for example, not necessarily the client mounting the filesystem, besides other idiosyncrasies that will creep in as these kinds of things vary with OS), not to mention non-file systems like Databases or Cloud Object Storage, etc. So this feels like a bit of a slippery slope, IMHO.

Ah... right. Good catch. I'll see what I can do about this. I think by default, the File ID Manager will only support local filesystems and NFS. For other cloud services, it's possible that they already expose a file ID, so the custom File ID manager implementation should be straightforward.

For other cloud services, it's possible that they already expose a file ID, so the custom File ID manager implementation should be straightforward.

I believe this is saying that non-file system-based ContentsManager derivations must implement their own FileIdManager. Is that correct?

Regarding NFS, I wouldn't be surprised if the st_dev value changes if a previously mounted FS is unmounted, then remounted (following the mount of others or even a reboot). (This hunch may be confirmed if I'm reading things correctly.)

I believe this is saying that non-file system-based ContentsManager derivations must implement their own FileIdManager. Is that correct?

If this is the case (prolly it is), the FileIdManager implementation should be configurable (like the ContentManager...). It may already be the case (have not looked at the code, sorry).

@echarles - this is discussed here: #921 (comment) although I don't think we should require that.

Personal benchmarks show a very significant performance gain when calling get_path(). This is because we are no longer scanning all the contents of every indexed-but-moved directory in _sync_all(), but rather intelligently appending the new paths of the directories to check for dirtiness to a deque used in the body of _sync_all(). Also fixes a few bugs (sorry, couldn't split this commit up): - Indexing symlinks always indexes the real path that symlink refers to - Fixes bug with the way new paths were calculated in recursive moves

codecov-commenter · 2022-08-22T17:16:30Z

Codecov Report

Merging #921 (86fcaf3) into main (1ec1aee) will increase coverage by 1.21%.
The diff coverage is 100.00%.

❗ Current head 86fcaf3 differs from pull request most recent head fb8cc40. Consider uploading reports for the commit fb8cc40 to get more accurate results

@@            Coverage Diff             @@
##             main     #921      +/-   ##
==========================================
+ Coverage   71.44%   72.66%   +1.21%     
==========================================
  Files          65       66       +1     
  Lines        7705     8084     +379     
  Branches     1289     1339      +50     
==========================================
+ Hits         5505     5874     +369     
+ Misses       1805     1804       -1     
- Partials      395      406      +11

Impacted Files	Coverage Δ
jupyter_server/pytest_plugin.py	`88.88% <100.00%> (+0.70%)`	⬆️
jupyter_server/serverapp.py	`65.88% <100.00%> (+0.60%)`	⬆️
jupyter_server/services/contents/fileidmanager.py	`100.00% <100.00%> (ø)`
jupyter_server/services/contents/filemanager.py	`72.25% <100.00%> (+0.05%)`	⬆️
jupyter_server/services/contents/handlers.py	`86.55% <100.00%> (ø)`
jupyter_server/services/contents/manager.py	`83.29% <100.00%> (+0.56%)`	⬆️
jupyter_server/auth/identity.py	`82.93% <0.00%> (-7.27%)`	⬇️
jupyter_server/auth/security.py	`75.67% <0.00%> (+0.33%)`	⬆️
... and 5 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

dlqqq · 2022-08-22T17:17:39Z

@kevin-bates Thank you for your suggestions! I addressed as many as I could, but I believe it's best if I start working on separating my work into a standalone server extension. The great thing about this is that we can split up each of your concerns into a separate PR to address them.

dlqqq · 2022-09-07T16:19:30Z

Hey team! This is now migrated into a separate server extension: https://github.com/jupyter-server/jupyter_server_fileid

Thank you all for your review comments! I've left open review questions as issues on that repo.

ellisonbg · 2022-10-11T08:20:22Z

Great discussion, a few comments: * Right now real-time collaboration only work where multiple users access a single Jupyter Server. The case where multiple users access multiple servers isn't handled yet and a lot of other things would have to change to get that working. As such I don't think the File ID service needs to solve this case yet. However, we should look forward to when it would have to deal with that case. * The scoping of a given FileID service is a ContentManager. Thus, if a user has multiple ContentManagers serving different file systems, they would presumably have 1 file ID service for each of this. For example if a user has a local file system and an S3 bucket exposed through 2 ContentManagers, they would have two File ID services. * Related to this case, I can imagine situations where a user wants to move or copy a file between different ContentManager (copy file to S3 for example). In that case, it may help to have UUIDs that are globally unique.

…

On Wed, Jul 27, 2022 at 10:44 PM Eric Charles ***@***.***> wrote: (disclaimer: I have not taken the needed time to read and digest the dense and very useful discussion around the file id service, but think it is not too early to lay down any requirements) Great questions! Yes, these concerns will be taken into much further study, but only after we handle operations on local filesystems. Well, if the design and the technical implementation shows to be fairly different when we would look after, this may be annoying (e.g. if the generated database in case of many/distributed/remote servers is different form le local database). To which extends, not sure, one could argue we don't really care as the local database would be anyway not migrated to any such distributed system. But playing devil advocate, I could imagine my local server has a custom server extension that sends to a corporate backend those IDs and that one day my notebooks would have to be served from JupyterHub. The reverse is also true. A JupyterHub instance could allow the downoad of the notebook that could be run on my local laptop. In short, I would prefer to cover the JupyterHub case in the design. Could you elaborate more on what you mean by "distributed [Jupyter] servers"? AFAIK JupyterHub is still a single server, so it seems more accurate to refer to it as "remote" rather than "distributed". I am thinking to a case where the notebook files are loaded from a shared storage system, e.g. NFS. In that case, the user goes today to server 1 instance, and tomorrow to server 2 instance. Another case is RTC where a server hosted by JupyterHub will be consumed by multiple users. So true, "distributed" is a bad name, IMHO a server is always "remote", "ephemeral" may be better but still not very good.. — Reply to this email directly, view it on GitHub <#921 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAGXUGKMHTL3T2TXJ3XUDLVWIM35ANCNFSM53PJHWXQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

-- Brian E. Granger Senior Principal Technologist, AWS AI/ML ***@***.***) On Leave - Professor of Physics and Data Science, Cal Poly @ellisonbg on GitHub

kevin-bates · 2022-10-11T14:42:47Z

The scoping of a given FileID service is a ContentManager. Thus, if a
user has multiple ContentManagers serving different file systems, they
would presumably have 1 file ID service for each of this. For example if a
user has a local file system and an S3 bucket exposed through 2
ContentManagers, they would have two File ID services.

(This is why it would have been nice if FileIDs were a function of (and emitted from) the ContentsManager.)
Since the FileID service is now a consumer of ContentsManager events, how will a given FileID service instance determine that the received event was relative to a LargeFileContentsManager, an S3ContentsManager, or a FooContentsManager?

Related to this case, I can imagine situations where a user wants to move
or copy a file between different ContentManager (copy file to S3 for
example). In that case, it may help to have UUIDs that are globally unique.

Yes, but more importantly, is that once objects referencing FileIDs find their way to other systems (via sharing or whatever), they will be incorrectly associated with the wrong files until UUIDs are used as the primary ID.

ellisonbg · 2022-10-11T17:15:37Z

Kevin, I think you are asking the right questions about how the file id service will work across different file systems. At the same time, there are a lot of other issues with how Jupyter Lab/Server work with different and multiple file systems. We plan on tackling those questions after JupyterLab 4 and Jupyter Server 2 are out and that work will include all of the questions you are asking about. For now, we want to make sure that the basic usage cases of commenting, notebook jobs, and RTC work well when files/directories are moved/renamed in JLab 3 and 4.

…

On Tue, Oct 11, 2022 at 7:42 AM Kevin Bates ***@***.***> wrote: - The scoping of a given FileID service is a ContentManager. Thus, if a user has multiple ContentManagers serving different file systems, they would presumably have 1 file ID service for each of this. For example if a user has a local file system and an S3 bucket exposed through 2 ContentManagers, they would have two File ID services. (This is why it would have been nice if FileIDs were a function of (and emitted from) the ContentsManager.) Since the FileID service is now a consumer of ContentsManager events, how will a given FileID service instance determine that the received event was relative to a LargeFileContentsManager, an S3ContentsManager, or a FooContentsManager? - Related to this case, I can imagine situations where a user wants to move or copy a file between different ContentManager (copy file to S3 for example). In that case, it may help to have UUIDs that are globally unique. Yes, but more importantly, is that once objects referencing FileIDs find their way to other systems (via sharing or whatever), they will be incorrectly associated with the wrong files <jupyter-server/jupyter_server_fileid#3 (comment)> *until* UUIDs are used as the primary ID. — Reply to this email directly, view it on GitHub <#921 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAGXUAY6QGRSTRSAP7J3V3WCV4HHANCNFSM53PJHWXQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Brian E. Granger Senior Principal Technologist, AWS AI/ML ***@***.***) On Leave - Professor of Physics and Data Science, Cal Poly @ellisonbg on GitHub

kevin-bates · 2022-10-11T18:59:57Z

I see, thank you for your response. Just to be clear, I'm not as worried about the filesystem issues as I am about avoiding data integrity emergencies - per my second point above - which we can continue discussing in the linked issue.

dlqqq added 2 commits July 13, 2022 16:11

add FileIdManager class and assoc. tests

9da7b46

integrate FileIdManager into ContentsManager implementations

86fcaf3

afshin mentioned this pull request Jul 19, 2022

Weekly Team Meetings: Jul-Dec 2022 jupyterlab/frontends-team-compass#152

Closed

kevin-bates mentioned this pull request Jul 21, 2022

Meeting Notes 2022 jupyter-server/team-compass#15

Closed

support out-of-band operations for FID.index/get_id

39998d7

Zsailer reviewed Jul 26, 2022

View reviewed changes

Zsailer added the enhancement label Jul 26, 2022

Zsailer added this to the Future milestone Jul 26, 2022

Zsailer reviewed Jul 26, 2022

View reviewed changes

jupyter_server/services/contents/fileidmanager.py Show resolved Hide resolved

Zsailer reviewed Jul 26, 2022

View reviewed changes

jupyter_server/serverapp.py Show resolved Hide resolved

dlqqq added 3 commits July 26, 2022 20:03

handle disjoint moves in move() method

6a5b24d

fix recursive move()

c98e5b6

handle edge case of creating new file at an empty but prev indexed path

57a19a7

dlqqq mentioned this pull request Jul 27, 2022

Document ID service for Jupyter Server/Lab/Notebook jupyterlab/jupyterlab#12614

Closed

dlqqq added 6 commits July 29, 2022 18:56

detect oob moves in get_path()

4e60209

minimize transactions by restricting usage to public methods in FIM

695de46

optimize _index_all() to not commit transactions

bea8dd4

fix get path following oob recursive move

56b7668

only index new directories in _scan_dir()

12e12b5

prefer _stat() over DirEntry.stat() for windows support

7ff6cb9

kevin-bates reviewed Aug 2, 2022

View reviewed changes

verify file existence in get_path()

7b97de8

dlqqq mentioned this pull request Aug 2, 2022

tracking issue: File ID Manager #940

Open

8 tasks

dlqqq changed the title ~~draft: File ID service~~ draft: File ID Manager base implementation Aug 2, 2022

do not accept *args in __init__()

8da5648

kevin-bates reviewed Aug 3, 2022

View reviewed changes

dlqqq mentioned this pull request Aug 22, 2022

new repo for File ID server extension jupyter-server/team-compass#33

Closed

28 tasks

Zsailer mentioned this pull request Aug 29, 2022

Emit events from the Contents Service #954

Merged

dlqqq mentioned this pull request Sep 7, 2022

create abstract base class for FileIdManager jupyter-server/jupyter_server_fileid#1

Closed

dlqqq closed this Sep 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draft: File ID Manager base implementation #921

draft: File ID Manager base implementation #921

dlqqq commented Jul 13, 2022 •

edited

Loading

Zsailer Jul 26, 2022

Zsailer Jul 26, 2022

Zsailer commented Jul 26, 2022

Zsailer commented Jul 26, 2022

echarles commented Jul 26, 2022

dlqqq commented Jul 27, 2022

echarles commented Jul 28, 2022

kevin-bates left a comment

kevin-bates Aug 1, 2022 •

edited

Loading

dlqqq Aug 2, 2022

kevin-bates Aug 2, 2022 •

edited

Loading

dlqqq commented Aug 2, 2022

dlqqq commented Aug 2, 2022 •

edited

Loading

kevin-bates Aug 3, 2022

dlqqq Aug 3, 2022

kevin-bates Aug 3, 2022

echarles Aug 3, 2022 •

edited

Loading

kevin-bates Aug 3, 2022

codecov-commenter commented Aug 22, 2022

dlqqq commented Aug 22, 2022

dlqqq commented Sep 7, 2022

ellisonbg commented Oct 11, 2022 via email

kevin-bates commented Oct 11, 2022

ellisonbg commented Oct 11, 2022 via email

kevin-bates commented Oct 11, 2022

	database_filepath = Unicode(
	default_value=":memory:",
	help=(
	"The filesystem path to SQLite Database file "
	"(e.g. /path/to/session_database.db). By default, the session "
	"database is stored in-memory (i.e. `:memory:` setting from sqlite3) "
	"and does not persist when the current Jupyter Server shuts down."
	),
	).tag(config=True)

	@validate("database_filepath")
	def _validate_database_filepath(self, proposal):
	value = proposal["value"]
	if value == ":memory:":
	return value
	path = pathlib.Path(value)
	if path.exists():
	# Verify that the database path is not a directory.
	if path.is_dir():
	raise TraitError(
	"`database_filepath` expected a file path, but the given path is a directory."
	)
	# Verify that database path is an SQLite 3 Database by checking its header.
	with open(value, "rb") as f:
	header = f.read(100)

	if not header.startswith(b"SQLite format 3") and not header == b"":
	raise TraitError("The given file is not an SQLite database file.")
	return value

draft: File ID Manager base implementation #921

draft: File ID Manager base implementation #921

Conversation

dlqqq commented Jul 13, 2022 • edited Loading

Overview

Testing

Benchmarks

Zsailer Jul 26, 2022

Choose a reason for hiding this comment

Zsailer Jul 26, 2022

Choose a reason for hiding this comment

Zsailer commented Jul 26, 2022

Zsailer commented Jul 26, 2022

echarles commented Jul 26, 2022

dlqqq commented Jul 27, 2022

echarles commented Jul 28, 2022

kevin-bates left a comment

Choose a reason for hiding this comment

kevin-bates Aug 1, 2022 • edited Loading

Choose a reason for hiding this comment

dlqqq Aug 2, 2022

Choose a reason for hiding this comment

kevin-bates Aug 2, 2022 • edited Loading

Choose a reason for hiding this comment

dlqqq commented Aug 2, 2022

dlqqq commented Aug 2, 2022 • edited Loading

kevin-bates Aug 3, 2022

Choose a reason for hiding this comment

dlqqq Aug 3, 2022

Choose a reason for hiding this comment

kevin-bates Aug 3, 2022

Choose a reason for hiding this comment

echarles Aug 3, 2022 • edited Loading

Choose a reason for hiding this comment

kevin-bates Aug 3, 2022

Choose a reason for hiding this comment

codecov-commenter commented Aug 22, 2022

Codecov Report

dlqqq commented Aug 22, 2022

dlqqq commented Sep 7, 2022

ellisonbg commented Oct 11, 2022 via email

kevin-bates commented Oct 11, 2022

ellisonbg commented Oct 11, 2022 via email

kevin-bates commented Oct 11, 2022

dlqqq commented Jul 13, 2022 •

edited

Loading

kevin-bates Aug 1, 2022 •

edited

Loading

kevin-bates Aug 2, 2022 •

edited

Loading

dlqqq commented Aug 2, 2022 •

edited

Loading

echarles Aug 3, 2022 •

edited

Loading