-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk Ingest #147
Comments
Comment by mjordan Of course I'd vote for BagIt, for the reasons @ruebot mentions. But, I'd be cautious about requiring it since not all sites will have or want to convert their stuff to Bags. Then again, if we're going to require a manifest, requiring BagIt is not all that different. |
Comment by daniel-dgi I'm not terribly familiar with BagIt. It's not something that I've dealt with in my work for clients. But at first glance it seems pretty appropriate. METS is another option, I guess. Or we could just use a simple JSON or YAML manifest, but something tells me an actual metadata standard would make people feel better about things. Other than BagIt (which I'm assuming contains all the data in one package), we could probably get away with just dropping the manifest in the watch folder, so long as it details the location of files and the user running the camel process has access to those locations. |
Comment by awoods @daniel-dgi, "holey" bags are also an option if not all of the data is available in the package, with the optional |
Comment by ruebot Adding fcrepo and upgration tags since this could also inform the proposed upgration migration tool discussed on today's Fedora Tech call. |
Comment by dmoses I think one of the most common patterns in the Drupal community for batch ingesting is using Feeds. It has a number of suppport modules for importing XML as well. @mjordan wrote a module a while back. BagIt would be good choice too and may add predictability to the ingest process. |
Comment by daniel-dgi Thanks for being awesome, @dmoses. Feeds seem attractive from a Drupal front end point of view. Could maybe parse rdfxml? Would like to hear what @mjordan has to say about pros/cons of using feeds and nodes. His module means he's probably got the most experience in that realm of Drupal land. Not the first time bags have come up, either. I'm interested in seeing if we can zip them and use them to replace our hand-rolled format for zip importer. Are bags of bags possible, as well? It would be amazing if we could mimic what we're doing in 1.x batch but with a well defined standard. |
Comment by ruebot Serialized bags are totally a thing. Are you thinking of the book and newspaper batch ingest w/r/t the bags in bags idea? |
Comment by mjordan @daniel-dgi Bags are agnostic to the content in their 'data' directory and that content's organization, so as @ruebot says, it's legal to have a Bag of Bags. The child Bags would just be serialized into .zip or .tgz files. To answer your question about nodes in Islandora Feeds, I took that approach because 1) it was easy/I am lazy and 2) it uncouples the steps of importing data and committing that data to the Fedora repo as objects. For example, you can perform various types of QA on the nodes before using Views Batch Operations to create the Islandora objects, add other datastreams, etc. I wrote that module about two years ago, in fact, I started it at OR3013, with @dmoses, @ruebot and some of the usual suspects sitting right beside me in the back few rows of seats. Now that we have a clear path for Islandora 7.x-2.x, it makes even more sense to create nodes (for obvious reasons) than it did then. A back of the envelope diagram for using an existing tool like Feeds to manage the import and Bags to wrap file assets might look something like: Feeds creates Drupal nodes that contain F4 object properties (maybe using a Feeds RDF parser?), with pointers to Bags on the Drupal filesystem. Each Bag contains the file assets for an Islandora object. The organization of the content within each Bag would likely be specific to each content model (basic image, newspaper issue, book, etc.). It is legal to also include a (non-Bag) manifest that represents the content model in some way e.g., OAI-ORE, METS), so we might want to explore that option as well. Using both Feeds and Bags like this is probably overkill, and preparing the Bags would put an additional burden on content handlers. But, there are a lot of other benefits to Bags that may justify that burden, like built-in checksum generation and packaging. Using holey Bags as @awoods points out would add even more flexibility. |
Comment by daniel-dgi Maybe we're really talking about two things here? Just using feeds to import nodes, and then zipped bags as a zip importer replacement? Heck, we could even just accept zip files on our services endpoints and use that to consume entire objects as opposed to the multipart/form-data shenanigans I've got going on right now. Would be nice to use bags in that way since it's a drupal agnostic fashion to move things around. Within Drupal, feeds definitely seems like a great way to go. Maybe we should make a ticket for someone to dabble around? This is getting interesting :) |
Comment by manez My (probably not typical) use case would be vastly improved by a bulk export/ingest interface - some way to pull down a small bunch of objects and their metadata, then upload them back up to another Islandora site. Sounds like that's something in the Bags wheelhouse? That said, +1 for Feeds being a nice GUI/Drupal-y way to import |
Comment by mjordan My (recyclable envelope) diagram used both Feeds and Bags because AFAIK Feeds doesn't deal with file assets in any standardized way and I was assuming that the nodes created by Feeds would have some binary files hanging off them. But, the two could be completely separate. Will jump back into the discussion later, must attend all the meetings now 😞 |
Comment by daniel-dgi @mjordan ah, i see. wasn't thinking about feeds not being able to handle files. |
Comment by dmoses I've got the 7.x.2 vm downloaded ... you can do files with feeds. I will investigate and try a proof of concept. Potentially?? it could be another migration tool by parsing the FOXML xml ... which includes paths to the binaries. Not sure. Will report back. |
I know this issue has gone stale, but why close it? |
I'm working on migrating issues over. this was a bad migration. The original on is still here https://github.com/islandora-interest-groups/Islandora-Fedora4-Interest-Group/issues/13 |
Ah figured it had moved elsewhere, sorry for the unnecessary ping 😄 |
Issue by daniel-dgi
Tuesday Feb 03, 2015 at 15:31 GMT
Originally opened as https://github.com/islandora-interest-groups/Islandora-Fedora4-Interest-Group/issues/13
Reformatting this to use the Use Case template.
Remarks:
The text was updated successfully, but these errors were encountered: