Conventions for predictable File URIs #248

ruebot · 2016-05-20T02:08:54Z

Issue by mjordan
Thursday Feb 26, 2015 at 18:53 GMT
Originally opened as https://github.com/islandora-interest-groups/Islandora-Fedora4-Interest-Group/issues/18

Title (Goal)	Predict File URIs
Primary Actor	Developer
Scope	Code-level conventions
Level	High
Story	As a developer, I will need to be aware of conventions for identifying specific types of files that may be associated with a Fedora 4 object using an object's fcdm:hasFile property. These conventions are similar to using 'TN', 'OBJ', and other datastream IDs commonly used across solution packs in Isandora 7.x-1-x.

Examples:

As a developer, I need to retrieve the OCR file associated with an object.
As a developer, I need to retrieve the previous version of a particular file associated with an object.

Remarks:

Related to Basic Camel service for migration #17.
Files in F4 aren't identified by conventional IDs (like datastreams are in F3), they are identified by URIs such as http://localhost:8080/rest/b8/fc/32/be/b8fc32be-34be-428e-bf7c-81be97e5f2e3. Since these URIs are opaque (i.e., using UUIDs to specify particular resources), developers need a way of knowing which URL to request for a specific purpose, such as retrieving a thumbnail representation of the object or retrieving the OCR transcript associated with an object.
F4 REST API documentation at https://wiki.duraspace.org/display/FEDORA4x/RESTful+HTTP+API provides examples of File URI assigned on creation.
URIs don't need to be opaque. The "Creating new binary resource at a specified path" example in the REST API docs illustrates how to assign a URI to a binary resource.

ruebot · 2016-05-20T02:08:55Z

Comment by ksclarke
Thursday Feb 26, 2015 at 19:25 GMT

I wonder if there is a tie-in here for persistent IDs (in whatever format you prefer: ARKs, DOIs, PURLs, etc.)? ARKs, for instance, have a way of specifying hierarchical files in the digital object represented by the ARK (using the ARK's Qualifier).

And I don't think the UUIDs are required for Fedora 4, but just what it uses out of the box? At one point there was a PID minter. The UUIDPathMinter was one option but you could choose to use another and it was configurable through something like:

https://github.com/fcrepo4/fcrepo4/blob/master/fcrepo-webapp/src/main/resources/spring/minter.xml

https://github.com/fcrepo4/fcrepo4/tree/master/fcrepo-mint/src/main/java/org/fcrepo/mint

But, I'm also not sure this wasn't removed in the code cleanup prior to the official release. I know there were issues with the path mapping of these IDs. It looks like it's still in master, but I'm not entirely sure of its status.

Persistent IDs have been on my Islandora wishlist for awhile so perhaps I'm lumping this in where it shouldn't be?

I always use UNT (not an Islandora site, but still) as the example of doing this right (in my opinion): http://digital.library.unt.edu/ark:/67531/metadc813/metadata/

ruebot · 2016-05-20T02:08:55Z

Comment by awoods
Thursday Feb 26, 2015 at 20:39 GMT

@ksclarke, yes, the pluggable pid-minting is alive and well in the F4 codebase. The "out of the box" UUIDPathMinter is designed with performance in mind... but there are other options available, including a remote HttpPidMinter.

ruebot · 2016-05-20T02:08:56Z

Comment by daniel-dgi
Thursday Feb 26, 2015 at 21:24 GMT

Is using a predicate for this type of thing too naive of an approach? I'd rather not mess with something that's gonna severely hurt performance just because we want semantics in the path.

ruebot · 2016-05-20T02:08:56Z

Comment by DiegoPino
Thursday Feb 26, 2015 at 21:39 GMT

@daniel-dgi, if i understand correctly, the path we give a resource is not directly tied to on how F4 stores/fetches internally it's resources(remember reading about a Hierarchy translator, it's in the code, not sure if enabled?). If so, performance should not be a problem. So predicates/props could be a nice way, moreover if defined explicitly in an Islandora Ontology (love this part!) so developers can grab this definitions, classes(object) and subclasses (associated resources - old datastreams) to know where to search for a specific resource.
'ark:' is not possible, at least not out of the box/standard, '/' | ':' | '[' | ']' | '|' | '*', can't be part of a local name (rdf). Documentation says resource have also an identifier (additionally to the Path). How is this identifier used externally, or not used at all?

ruebot · 2016-05-20T02:08:57Z

Comment by awoods
Thursday Feb 26, 2015 at 21:57 GMT

I definitely prefer the property/predicate approach in conjunction with something along the lines of the FCDM: https://wiki.duraspace.org/display/FF/Fedora+Community+Data+Model as opposed to semantically meaningful URLs.

ruebot · 2016-05-20T02:08:57Z

Comment by mjordan
Thursday Feb 26, 2015 at 22:21 GMT

@daniel-dgi and @awoods, what would a typical REST conversation look like if the use case was "give me a copy of the file that has been designated as the thumbnail image for the object?"

ruebot · 2016-05-20T02:08:59Z

Comment by awoods
Thursday Feb 26, 2015 at 23:17 GMT

@mjordan, In conjunction with a structuring along the lines of FCDM:

https://wiki.duraspace.org/display/FF/Fedora+Community+Data+Model , and
https://docs.google.com/document/d/1o-Iq1oKN_W5NXXDQC81pxkhibOz_AhZlY7IShxPTR5M/edit , and
https://docs.google.com/document/d/1RI8aX8XQEk-30-Ht-DaPF5nz_VtI1-eqxUuDvF3nhv0/edit#
I would expect there to be a documented agreement on which predicates/vocabularies are used in your model to represent specific relationships.

Ideally, the triples of your repository are indexed in an external triplestore (Fuseki, Sesame, etc). Then you simply make a SPARQL-Query such as:

select ?thumb where {
    <host/collections/{id}> fcdm:hasThumbnail ?thumb .
}

Followed by a GET /URL-of-thumbnail

If, however, a REST interaction is required, here are some possibilities looking for an object's (container's) thumbnail:

GET /collections/{id}/
** Parse RDF looking for triple: </collections/{id}/> fcdm:hasThumbnail <URL-of-thumbnail>
GET /URL-of-thumbnail

Alternatively, if more dynamic relationships are in play, the interaction may be more like:

GET /collections/{id}/
** Parse RDF looking for triples: </collections/{id}/> fcdm:hasRelatedFile <URL-of-file>
For <URLs-of-files>, GET /URL-of-file parsing RDF for <URL-of-file> a fcdm:Thumbnail

But from a performance perspective, you probably want to hit Fedora as little as possible and instead take advantage of tooling that is optimized for this sort of thing, such as a proper triplestore.

ruebot · 2016-05-20T02:08:59Z

Comment by mjordan
Friday Feb 27, 2015 at 15:07 GMT

@awoods Thanks, very helpful. I think the "documented agreement on which predicates/vocabularies are used in your model" is really the root of my original question though. Currently in Islandora there are several conventions (either implicit or explicit) that form this agreement - for example, I can't think of any content models that don't use the DSID 'TN' to identify a thumbnail, or any that don't use 'OCR' for the page-level text transcript of a paged document. If I could rephrase my user story, it would be "As a developer, I will need to be aware of an agreed-upon set of RDF predicates for specific types of files associated with an Object/Container that has a given content model."

ruebot · 2016-05-20T02:08:59Z

Comment by daniel-dgi
Thursday Apr 16, 2015 at 13:43 GMT

@mjordan See https://github.com/Islandora-Labs/islandora/blob/7.x-2.x/docs/technical-documentation/services.md. Let me know what you think. Still WIP (we have a LOT of different datastrems/derivative types that need to be acounted for), but it's a start at fleshing all this out.

dannylamb · 2016-09-08T20:16:30Z

Closing old use cases until after MVP doc is released.

ruebot added use case labels May 20, 2016

dannylamb closed this as completed Sep 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conventions for predictable File URIs #248

Conventions for predictable File URIs #248

ruebot commented May 20, 2016

ruebot commented May 20, 2016

ruebot commented May 20, 2016

ruebot commented May 20, 2016

ruebot commented May 20, 2016

ruebot commented May 20, 2016

ruebot commented May 20, 2016

ruebot commented May 20, 2016

ruebot commented May 20, 2016

ruebot commented May 20, 2016

dannylamb commented Sep 8, 2016

Conventions for predictable File URIs #248

Conventions for predictable File URIs #248

Comments

ruebot commented May 20, 2016

ruebot commented May 20, 2016

ruebot commented May 20, 2016

ruebot commented May 20, 2016

ruebot commented May 20, 2016

ruebot commented May 20, 2016

ruebot commented May 20, 2016

ruebot commented May 20, 2016

ruebot commented May 20, 2016

ruebot commented May 20, 2016

dannylamb commented Sep 8, 2016