-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export a repository object (node, media, files) #1096
Comments
If a command-line tool external to Drupal is sufficient, try
https://github.com/mjordan/islandora_bagger.
—
… You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1096>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADCTTRFZX5UZMF2HT2MK4LPRCCKHANCNFSM4HG5UXHQ>
.
|
What we do at UT-Austin in I-7 is use the bagging feature so our users can request bags for preservation purposes and offsite vaulting. we use both bagging via the interface for bags under 2g and for bags over 2g they get queued for drush processing and bagged overnight. We provide the ability to bag ALL datatstreams and metadata of the object and for paged content it will also bag the "pages" and their datastreams and metadata Our users have also requested the ability to bag selected datastreams |
UTSC has a similar use case and workflow as noted by @rangel35. A mods flag indicates which objects can be bagged. A report is generated with pids. The objects are exported via command line using drush. (We have considered adding a premis event on bag creation. As it seemed to complicate the workflow, we did not implement that). In islandora 7.x we bag the full atom zip (including versions) with archive context. In one of the storage locations, we aim to do validation of bags as well. In 7.x, we run into problems exporting large objects or collections consistently, thus command line seems to work best. Having the option to bag from UI and Islandora API is nice to have as we don't have a way to download the whole object right now. Also, it would be ideal to have an option to ingest from a bag or another export format. |
Some preliminary thoughts on a Bagging microservice:
Having a microservice separate from Drupal do the bagging would allow the jobs to run as long as they needed to, eliminating the risk of timing out in front of the user because the bagging is done asyncronously. We'd need to figure out how to allow for different Bag options, but those could possibly be sent as the REST POST request's body or something. @Natkeeran with regard to ingest from a Bag, that is something that users have been asking for for a while. But, with Islandora 8's nice REST interfaces, we can probably figure out how to map the contents of a Bagged object back to the originating components of the node+media fairly easily and push it into Islandora using something like https://github.com/mjordan/claw_rest_ingester. I think using URIs to define what taxonomy terms should be assigned to the reingested object would be useful here as well. |
@mjordan @Natkeeran I would love to see bags (or zipped bags, really), be the new zip importer format. I don't know how possible that is given how widely bags can vary, but it makes sense to move away from a bespoke format to a more widely adopted one. |
The feature set for microservice looks good. We can extended it later in the Drupal side to have a flag and queue/cron mechanism. Ingest would be a neat addition, with use cases such as restore from backup, migration and batch ingest from zip. Having ingest from zip can theoretically be seen as bootstrapping Drupal from Fedora as well. Some points to consider:
|
@Natkeeran yes, those are all significant issues, but I see them as out of scope for the Bagit functionality. They are more data modeling issues, aren't they? @dannylamb couldn't agree more. Even if an institution hasn't adopted Bagit widely, the tooling is decent and it is always easier to convert from a standard format than from a bespoke one, especially from a long-term preservation perspective (e.g. the platform tied to the bespoke format hasn't been in use in 20 years....). |
I-8 creates a UUID couldn't we use that as the PID? or are you thinking more along the standard namespace type PID? |
In order for the creation of Bags to be truly decoupled from the Drupal module One advantage of the batch approach is that since the bagger would be running in a CLI environment, it wouldn't time out like it would if the bags were generated within an HTTP response. |
Did some work on Islandora Bagger over the weekend. It now has a REST API that lets you add a node ID and settings file to a queue. It also has a simple FIFO queue manager, and a console command to process the queue. The original CLI The README explains how it works: as
each request's node IDs is added to the queue, along with the path to the settings YAML file (which is the body of the request). In a cronjob, you would run the following to process the queue:
which loops through the queue and runs the
|
The Robertson Library's RDM project uses Mark Jordan's Islandora Bagger and integration module. We have a BagIt ansible role which installs our fork of islandora_bagger and of islandora_bagger_integration. |
We need to be able to export a digital repository object fully for various uses cases including migration and preservation (AIP/Bags).
*(Advanced: Pull in the full graph of a Repository Item. I.e if it has uris to subject, pull in that uri!)
Additional Info:
We probably need a method to ingest the exported object as well.
The text was updated successfully, but these errors were encountered: