Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build a re-usable File Service #189

Closed
whikloj opened this issue Apr 13, 2016 · 17 comments
Closed

Build a re-usable File Service #189

whikloj opened this issue Apr 13, 2016 · 17 comments
Assignees

Comments

@whikloj
Copy link
Member

whikloj commented Apr 13, 2016

Collections, Basic Image, Video, Audio will all have to hold files or proxies for files.

This ticket is to design a generalized service that could be used from all of these and more.

@DiegoPino
Copy link
Contributor

👍 fully needed. Will start doing some thinking.

@ruebot
Copy link
Member

ruebot commented Apr 14, 2016

https://github.com/daniel-dgi/porkpie

I'll leave that there, lest we forget again. :poppy:

@daniel-dgi :poopy:

@br2490
Copy link

br2490 commented May 18, 2016

Done in PDX

Initial Thoughts
POST/PUT Endpoint should:
accept binary
accept UUID of parent, if empty REJECT (because files at this level should be part of a collection)
accept isPreservationMaster bool (Ontology/Semantic to use for this?)
create UUID for the file (not filesets if moving pcdm:works)
...

Resources for brainstorming, conceptualizing and causing mild panic attacks:

@DiegoPino's example/concept on future.islandora: http://future.islandora.ca:8080/fcrepo/rest/DAM/books/book1/pages/page1

Possible conceptualization broarder than large img: https://raw.githubusercontent.com/wiki/duraspace/pcdm/yEd/islandora-large-image/Islandora-PCDM-Large-Image.jpg

https://github.com/duraspace/pcdm/wiki/PCDM-2.0

duraspace/pcdm#53

@ruebot ruebot added PDX and removed Crayfish labels May 18, 2016
@whikloj
Copy link
Member Author

whikloj commented May 18, 2016

I know @DiegoPino asked for this, but do we want a specified predicate on a File that defines it as the preservation master? Then we have to (possibly) check and update on any FileSet modification.

For example:

@prefix pcdm: <http://pcdm.org/models#> .

<#MyDog> a pcdm:Object ;
  pcdm:hasFileSet <#fs1> .

<#fs1> a pcdm:FileSet ;
  <pcdm:hasFile>  <#file1>  .

<#file1> a pcdm:File .

<#file1> would be our preservation master even if it was a low-res JPG, then if we added a second one.

<#fs1> a pcdm:FileSet ;
  <pcdm:hasFile>  <#file1>, <#file2>  .

<#file1> a pcdm:File .

<#file2> a pcdm:File .

where <#file2> is a Tiff, then we have to remove the predicate from <#file1> and add it to <#file2>.

We would have to perform this comparison on any PUT/POST to the <#fs1> fileset.

Just wondering if that is a good option or should we be determining the preservation master only when we actually want to create a derivative?

@br2490
Copy link

br2490 commented May 18, 2016

@whikloj interesting point, maybe flagging a file at this level as preservation master does not make sense. It feels like once the conceptualization of FPR comes together it might make sense to revisit? Regardless I could stub it out in code and if it needs to be s/moved/removed it can be?

@whikloj
Copy link
Member Author

whikloj commented May 18, 2016

@br2490 👍

@DiegoPino
Copy link
Contributor

@whikloj , @br2490 FileSet?
Not sure if we are there yet. Are we officially assuming PCDM 2.0 as our semantic-structural model? Then if so i need an ontology, not only example diagrams.
Can an object can have multiple preservation masters attached, different formats, for describing different resources? i my mind yes. I'm kinda lost.

@whikloj
Copy link
Member Author

whikloj commented May 18, 2016

How are you defining an Object? In my mind an object is a single resource.

So if that is a complex object that would have multiple files of different formats that are describing different resources....wouldn't that me multiple objects?

@DiegoPino
Copy link
Contributor

Not sure @whikloj. I can think of multiple use cases, but i'm probably out of library stuff (again, and again). I feel (damn feelings) we are handling this again as we did in old islandora. And the whole OBJ + derivatives was a mess. What i would like, personal feelings again, is just a way to mark. Hey this, whatever slug named binary resource, should be used to make derivatives. Nothing more. And this other, is already a derivate of something else, so please camel, dont do anything with it. Not sure how to put this in another way.

@br2490
Copy link

br2490 commented May 19, 2016

I make no assumptions, I only want to stub something and when ya'll (the committers) agree on an Ontology we can work on the semantic restructure. I just like the idea of starting with something ~ even if the end result is totally different.

@DiegoPino
Copy link
Contributor

@br2490 totally == starting with something == 👍 . Bad practice during a sprint to be super picky, my mistake sorry!!

@whikloj
Copy link
Member Author

whikloj commented Jun 27, 2016

Re: https://github.com/br2490/pdx/blob/issue-189/src/FileService/README.md
From my point of view (and feel free to disagree)

While thinking about this I decided that we have to add a UUID to the NonRDFSource (read: binaries) /fcr:metadata. Otherwise, we can't get the file. We'd have to name them (ie. PUT) and get them by name...UUID seems better.

So you need to ensure that if a UUID is not provided, that one is added to the RDF. I'm not sure how to ensure this happens as there will be two requests. One to PUT/POST the binary file and then one to PUT the fcr:metadata.

Perhaps Drupal doesn't give it a UUID and instead the FileService assigns the UUID and it gets synced back to Drupal through Islandora Sync??

Back to the README

  1. Should add files to filesets in an Object

    Should we add a proxy for the File to the FileSet indirect container, much like how we add objects to a Collection. But this is kind of a grey area, as I'm not sure if there is much area for file reuse. Thoughts?

  2. IF no fileset container exists, create one and report

    Create a FileSet indirect container and add the File and then report?

  3. Be agnostic as to ontology,...

    So for this, I think you can pass the request along to the ResourceService and have it PUT/POST the binary for you. Then follow up with a PATCH to the /fcr:metadata endpoint.

  4. Should return something if successful else panic.

    Should definitely return something, probably a 201 if the file is uploaded otherwise return the response that caught you off guard.

Regarding Should accept

  1. put/post file/{object} - creates pcdm:Object and attaches any binaries as pcdm:File. Object here hasMember, hasRelatedObj.

    I would say, you don't accept a pcdm:Object. That would be the ObjectService (@bryjbrown), you accept the UUID of an Object in the route and add the pcdm:FileSet/pcdm:File to it.

  2. patch file/{object} - append additional metadata.

    Again, I would suggest that you only patch files. Because what you'll be doing is located the file (using the idToUri translation) then appending /fcr:metadata. Leave Object patching to the PDX ObjectService.

    All Patch requests can be passed to the ResourceService in the end.

  3. get file/{id} - do something.

    So a get can pass straight to the ResourceService, but what you could do is content-negotiation. If the Accept: header is some form of RDF (ie. text/turtle, application/ld+json, application/rdf+xml), then you actually want the /fcr:metadata information. Otherwise return the binary.

    ResourceService just needs to know if you want the object or the metadata.

  4. delete file/{object} - delete something.

    Again, only deal with Files. You could (and this would be nice, but not a requirement) to verify that the UUID asked for returns a pcdm:File. Or throw an Exception of some sort.

    But you could do that sort of validation for all these routes. Not sure if that would add too much lag.

    But for delete, again you can just pass this request to the ResourceService for the most part.

@br2490
Copy link

br2490 commented Jun 27, 2016

Cool 😎 I just glimpsed at this and will carefully read tonight. Thanks

Sent from my phone. Forgive brevity and autocorrect.
On Jun 27, 2016 4:27 PM, "Jared Whiklo" [email protected] wrote:

Re: https://github.com/br2490/pdx/blob/issue-189/src/FileService/README.md
From my point of view (and feel free to disagree)

While thinking about this I decided that we have to add a UUID to the
NonRDFSource (read: binaries) /fcr:metadata. Otherwise, we can't get the
file. We'd have to name them (ie. PUT) and get them by name...UUID seems
better.

So you need to ensure that if a UUID is not provided, that one is added to
the RDF. I'm not sure how to ensure this happens as there will be two
requests. One to PUT/POST the binary file and then one to PUT the
fcr:metadata.

Perhaps Drupal doesn't give it a UUID and instead the FileService assigns
the UUID and it gets synced back to Drupal through Islandora Sync??

Back to the README

Should add files to filesets in an Object

Should we add a proxy for the File to the FileSet indirect container,
much like how we add objects to a Collection. But this is kind of a grey
area, as I'm not sure if there is much area for file reuse. Thoughts?
2.

IF no fileset container exists, create one and report

Create a FileSet indirect container and add the File and then report?
3.

Be agnostic as to ontology,...

So for this, I think you can pass the request along to the
ResourceService and have it PUT/POST the binary for you. Then follow up
with a PATCH to the /fcr:metadata endpoint.
4.

Should return something if successful else panic.

Should definitely return something, probably a 201 if the file is
uploaded otherwise return the response that caught you off guard.

Regarding Should accept

put/post file/{object} - creates pcdm:Object and attaches any binaries
as pcdm:File. Object here hasMember, hasRelatedObj.

I would say, you don't accept a pcdm:Object. That would be the
ObjectService (@bryjbrown https://github.com/bryjbrown), you accept
the UUID of an Object in the route and add the pcdm:FileSet/pcdm:File to it.
2.

patch file/{object} - append additional metadata.

Again, I would suggest that you only patch files. Because what you'll
be doing is located the file (using the idToUri translation) then appending
/fcr:metadata. Leave Object patching to the PDX ObjectService.

All Patch requests can be passed to the ResourceService in the end.
3.

get file/{id} - do something.

So a get can pass straight to the ResourceService, but what you could
do is content-negotiation. If the Accept: header is some form of RDF (ie.
text/turtle, application/ld+json, application/rdf+xml), then you actually
want the /fcr:metadata information. Otherwise return the binary.

ResourceService
https://github.com/Islandora-CLAW/Crayfish/blob/master/src/ResourceService/Controller/ResourceController.php#L21
just needs to know if you want the object or the metadata.
4.

delete file/{object} - delete something.

Again, only deal with Files. You could (and this would be nice, but
not a requirement) to verify that the UUID asked for returns a pcdm:File.
Or throw an Exception of some sort.

But you could do that sort of validation for all these routes. Not
sure if that would add too much lag.

But for delete, again you can just pass this request to the
ResourceService for the most part.


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#189 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ANhlbLnCWV_PweZBdGQKYJB1eslF9GFOks5qQDI_gaJpZM4IGpGg
.

@ruebot
Copy link
Member

ruebot commented Aug 4, 2016

@br2490 @bryjbrown can y'all give a status update on this for @dannylamb's benefit?

@bryjbrown
Copy link
Member

@ruebot @dannylamb I'm working on #222 (ObjectService) and @br2490 is working on #189 (FileService). We spent the June sprint discussing how PDX should implement PCDM, and settled on a very basic view of it (at least for the beginning) where we only use a subset of the http://pcdm.org/models# ontology.

I'm going to try to use the right words for this and probably fall way off the mark, but here goes: PDX's API should allow you to add the pcdm model type to a resource by specifying a resource UUID and the thing you want it to be (Collection/Object/File). It should also allow you to add pcdm relationships between resources with the appropriate predicates (hasFile, memberOf, etc) through the API by passing in the UUIDs of the resources to be linked.

I mocked up an idea for how the ObjectService API would look by blatantly copying the API that was in place for CollectionService, which you can see here: https://github.com/bryjbrown/pdx#objectservice. I have no idea whether its even appropriate or not though since I have 0 experience with API design and a beginner's understanding of LOD principles. Ben has some notes on plans for the API design of the FileService here: https://github.com/br2490/pdx/tree/issue-189/src/FileService

@bryjbrown
Copy link
Member

I would also add that since the majority of the work we've done thus far has been conceptual (trying to wrap our heads around PCDM, Silex and API design), I for one would be totally fine if @dannylamb (or any of the other community members) wanted to scrap my branch and start over in a more deliberate approach. Whatever makes it more usable in the long run I say.

@ruebot
Copy link
Member

ruebot commented Aug 4, 2016

@bryjbrown this great, and very much appreciated! Let's put this on the next CLAW agenda. I think it will be very relevant to the Alpaca discussion, and help us prioritize where we focus the MVP efforts.

@ruebot ruebot modified the milestone: Community Sprint - 10 Aug 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants