Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would it be possible/desirable to be able to pickle BIDSLayout objects? #435

Closed
oesteban opened this issue Apr 18, 2019 · 9 comments · Fixed by #432
Closed

Would it be possible/desirable to be able to pickle BIDSLayout objects? #435

oesteban opened this issue Apr 18, 2019 · 9 comments · Fixed by #432

Comments

@oesteban
Copy link
Collaborator

The main use case that leads me to propose this is a Nipype workflow running across multiple processes. Currently, since BIDSLayout is not pickleable, all individual interfaces relying on a layout object will need to instantiate and index a bids root. It would be great to be able to have a shared object that indexes the folder only once (and I think that requires pickling).

@oesteban
Copy link
Collaborator Author

cc @satra to know whether this makes sense from the Nipype perspective

@tyarkoni
Copy link
Collaborator

tyarkoni commented Apr 18, 2019

In 0.9 (still a WIP), the BIDSLayout is backed by a SQLite database. By default it uses an in-memory database, but if you pass a DB file at initialization, it will use that as the store. So I think this should address your needs. (If not, let me know what it misses.)

@oesteban
Copy link
Collaborator Author

That sounds great, making BIDSLayout pickleable in that situation would be fairly easy.

@tyarkoni
Copy link
Collaborator

Are you sure you'd still need to pickle the BIDSLayout once you have a database file? Can't you just pass the same DB filename to all processes (or failing that, just copy the file)? Outside of the indexing process, there's essentially no overhead to re-initializing a BIDSLayout, so as long as you can use an existing DB, my hope was that this would obviate the need for any further serialization...

@oesteban
Copy link
Collaborator Author

Probably re-initializing the layout would be more efficient 👍

@satra
Copy link
Contributor

satra commented Apr 18, 2019

databases/sets and queries should always be re-initialized except to preserve prior state (for reproducibility).

this is why datagrabber and its forms always run, because we have no way of guaranteeing that things have not changed. but if we could do that, say with a db file, we can then hash it.

@tyarkoni
Copy link
Collaborator

tyarkoni commented Apr 18, 2019

We could easily add a read-only option to BIDSLayout. We could even make that the default, as there's no obvious reason why someone would want/need to modify records in the DB itself after indexing.

@satra
Copy link
Contributor

satra commented Apr 18, 2019

@tyarkoni - the most common use case for bids in a lab is in an ongoing study, so they would likely want to reindex as participants or sessions are added. but having a parameter would be very useful. if set to readonly, we can hash, if not, we reindex the layout.

@tyarkoni tyarkoni mentioned this issue Apr 20, 2019
@tyarkoni
Copy link
Collaborator

Okay, this is added in the 0.9 WIP. By default, each BIDSLayout uses its own in-memory SQLite database. But, if you either call .save(path) or initialize with database_file=path, it will use the specified file instead. There is also a reset_database argument that forces reindexing when set to True.

I decided not to add a read-only option, as it's actually not so easy to accidentally write to the DB (you would need to explicitly retrieve the .session from the layout and call commit() on it; there aren't (m)any opportunities to mutate the DB during normal post-indexing operations).

The way I see it, if a user is using a DB file on disk, and is incrementally adding data to the dataset, it's on them to pass reset_database=True. I don't think we want to assume that any changes to the dataset should prompt automatic reindexing (we could potentially add a 'detect' option down the line if there's demand for it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants