Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conventions for predictable File URIs #248

Closed
ruebot opened this issue May 20, 2016 · 10 comments
Closed

Conventions for predictable File URIs #248

ruebot opened this issue May 20, 2016 · 10 comments

Comments

@ruebot
Copy link
Member

ruebot commented May 20, 2016

Issue by mjordan
Thursday Feb 26, 2015 at 18:53 GMT
Originally opened as https://github.com/islandora-interest-groups/Islandora-Fedora4-Interest-Group/issues/18


Title (Goal) Predict File URIs
Primary Actor Developer
Scope Code-level conventions
Level High
Story As a developer, I will need to be aware of conventions for identifying specific types of files that may be associated with a Fedora 4 object using an object's fcdm:hasFile property. These conventions are similar to using 'TN', 'OBJ', and other datastream IDs commonly used across solution packs in Isandora 7.x-1-x.

Examples:

  • As a developer, I need to retrieve the OCR file associated with an object.
  • As a developer, I need to retrieve the previous version of a particular file associated with an object.

Remarks:

@ruebot
Copy link
Member Author

ruebot commented May 20, 2016

Comment by ksclarke
Thursday Feb 26, 2015 at 19:25 GMT


I wonder if there is a tie-in here for persistent IDs (in whatever format you prefer: ARKs, DOIs, PURLs, etc.)? ARKs, for instance, have a way of specifying hierarchical files in the digital object represented by the ARK (using the ARK's Qualifier).

And I don't think the UUIDs are required for Fedora 4, but just what it uses out of the box? At one point there was a PID minter. The UUIDPathMinter was one option but you could choose to use another and it was configurable through something like:

https://github.com/fcrepo4/fcrepo4/blob/master/fcrepo-webapp/src/main/resources/spring/minter.xml

https://github.com/fcrepo4/fcrepo4/tree/master/fcrepo-mint/src/main/java/org/fcrepo/mint

But, I'm also not sure this wasn't removed in the code cleanup prior to the official release. I know there were issues with the path mapping of these IDs. It looks like it's still in master, but I'm not entirely sure of its status.

Persistent IDs have been on my Islandora wishlist for awhile so perhaps I'm lumping this in where it shouldn't be?

I always use UNT (not an Islandora site, but still) as the example of doing this right (in my opinion): http://digital.library.unt.edu/ark:/67531/metadc813/metadata/

@ruebot
Copy link
Member Author

ruebot commented May 20, 2016

Comment by awoods
Thursday Feb 26, 2015 at 20:39 GMT


@ksclarke, yes, the pluggable pid-minting is alive and well in the F4 codebase. The "out of the box" UUIDPathMinter is designed with performance in mind... but there are other options available, including a remote HttpPidMinter.

@ruebot
Copy link
Member Author

ruebot commented May 20, 2016

Comment by daniel-dgi
Thursday Feb 26, 2015 at 21:24 GMT


Is using a predicate for this type of thing too naive of an approach? I'd rather not mess with something that's gonna severely hurt performance just because we want semantics in the path.

@ruebot
Copy link
Member Author

ruebot commented May 20, 2016

Comment by DiegoPino
Thursday Feb 26, 2015 at 21:39 GMT


@daniel-dgi, if i understand correctly, the path we give a resource is not directly tied to on how F4 stores/fetches internally it's resources(remember reading about a Hierarchy translator, it's in the code, not sure if enabled?). If so, performance should not be a problem. So predicates/props could be a nice way, moreover if defined explicitly in an Islandora Ontology (love this part!) so developers can grab this definitions, classes(object) and subclasses (associated resources - old datastreams) to know where to search for a specific resource.
'ark:' is not possible, at least not out of the box/standard, '/' | ':' | '[' | ']' | '|' | '*', can't be part of a local name (rdf). Documentation says resource have also an identifier (additionally to the Path). How is this identifier used externally, or not used at all?

@ruebot
Copy link
Member Author

ruebot commented May 20, 2016

Comment by awoods
Thursday Feb 26, 2015 at 21:57 GMT


I definitely prefer the property/predicate approach in conjunction with something along the lines of the FCDM: https://wiki.duraspace.org/display/FF/Fedora+Community+Data+Model as opposed to semantically meaningful URLs.

@ruebot
Copy link
Member Author

ruebot commented May 20, 2016

Comment by mjordan
Thursday Feb 26, 2015 at 22:21 GMT


@daniel-dgi and @awoods, what would a typical REST conversation look like if the use case was "give me a copy of the file that has been designated as the thumbnail image for the object?"

@ruebot
Copy link
Member Author

ruebot commented May 20, 2016

Comment by awoods
Thursday Feb 26, 2015 at 23:17 GMT


@mjordan, In conjunction with a structuring along the lines of FCDM:

Ideally, the triples of your repository are indexed in an external triplestore (Fuseki, Sesame, etc). Then you simply make a SPARQL-Query such as:

select ?thumb where {
    <host/collections/{id}> fcdm:hasThumbnail ?thumb .
}
  • Followed by a GET /URL-of-thumbnail

If, however, a REST interaction is required, here are some possibilities looking for an object's (container's) thumbnail:

  1. GET /collections/{id}/
    ** Parse RDF looking for triple: </collections/{id}/> fcdm:hasThumbnail <URL-of-thumbnail>
  2. GET /URL-of-thumbnail

Alternatively, if more dynamic relationships are in play, the interaction may be more like:

  1. GET /collections/{id}/
    ** Parse RDF looking for triples: </collections/{id}/> fcdm:hasRelatedFile <URL-of-file>
  2. For <URLs-of-files>, GET /URL-of-file parsing RDF for <URL-of-file> a fcdm:Thumbnail

But from a performance perspective, you probably want to hit Fedora as little as possible and instead take advantage of tooling that is optimized for this sort of thing, such as a proper triplestore.

@ruebot
Copy link
Member Author

ruebot commented May 20, 2016

Comment by mjordan
Friday Feb 27, 2015 at 15:07 GMT


@awoods Thanks, very helpful. I think the "documented agreement on which predicates/vocabularies are used in your model" is really the root of my original question though. Currently in Islandora there are several conventions (either implicit or explicit) that form this agreement - for example, I can't think of any content models that don't use the DSID 'TN' to identify a thumbnail, or any that don't use 'OCR' for the page-level text transcript of a paged document. If I could rephrase my user story, it would be "As a developer, I will need to be aware of an agreed-upon set of RDF predicates for specific types of files associated with an Object/Container that has a given content model."

@ruebot
Copy link
Member Author

ruebot commented May 20, 2016

Comment by daniel-dgi
Thursday Apr 16, 2015 at 13:43 GMT


@mjordan See https://github.com/Islandora-Labs/islandora/blob/7.x-2.x/docs/technical-documentation/services.md. Let me know what you think. Still WIP (we have a LOT of different datastrems/derivative types that need to be acounted for), but it's a start at fleshing all this out.

@dannylamb
Copy link
Contributor

Closing old use cases until after MVP doc is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants