Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bundle or object in URI convention #252

Closed
dvoet opened this issue Apr 22, 2019 · 12 comments
Closed

bundle or object in URI convention #252

dvoet opened this issue Apr 22, 2019 · 12 comments

Comments

@dvoet
Copy link
Contributor

dvoet commented Apr 22, 2019

the work-in-progress docs for URI convention suggest the form drs://<server>/<id>. But this does not let the caller know if the identified thing is an object or a bundle making it hard to figure out the right request to make. I suggest adding something in the path to distinguish. drs://<server>/<bundle|object>/<id> or drs://<server>/<b|o>/<id> if length is a concern.

@briandoconnor
Copy link
Contributor

Good point, I reached the same conclusion when cleaning up the docs for v0.1. @dvoet can you take a look at release 0.1 (https://ga4gh.github.io/data-repository-service-schemas/preview/release/0.1/docs/) and see what you think? If folks don't like the style I put together here then we can open a PR and get something better in for v0.2.

Another thing, the spec does not cover is how to use GUIDs of various flavors with DRS (e.g. DOIs, https://dataguids.org/, etc). We don't attempt to create a standard around this or conventions for this in DRS at this time. So a URI like drs:// is out of scope (but, again, we could either add it in scope after talking to our drivers and add it in via a PR to 0.2 or say it's out of scope and something we should partner on with Discovery).

@briandoconnor
Copy link
Contributor

I talked with @dglazer and we came to the conclusion that, since this is not full baked, we should pull the text about URIs from the current spec docs for v0.1.

So here's that text that outlines one way of doing it:

For convenience, we define a recommended syntax for fully referencing DRS-accessible objects. Strings of the form drs:////<[bundle|object]>// mean “make a DRS
GET to the HTTPS address at , passing in the DRS id at the end of the appropriate /<[bundle|object]>/, to retrieve the DRS JSON response”.
The is optional depending on what you are doing e.g. the is /access when retrieving a particular access method. This is all just
a complicated way of just saying "replace drs:// with https:// and GET on the URL that results."

For example:

The service-info endpoint can be queried to retrieve information about the version of DRS supported by this server e.g. https:////service-info

These URIs will be useful when passing objects to a WES server for processing since they provide a way to understand how to successfully make an HTTPS request and process the JSON response.

@dglazer
Copy link
Member

dglazer commented Apr 28, 2019

Here's another possible, slightly simpler, approach to URI strings:

For convenience, including when passing content references to a WES server, we define a recommended URI syntax for DRS-accessible content. Strings of the form drs://<server>/<id> mean “you can fetch the DRS content with id <id> from the DRS server at <server>".

For example, if a WES server was asked to process drs://drs.example.org/314159, it would know that it could issue a GET request to http://drs.example.org/ga4gh/drs/v1/objects/314159 to learn how to fetch that object.

This approach doesn't address Doug's object vs. bundle concern -- I wonder if that's a more fundamental problem, where a DRS id currently is only useful if you separately know what kind of thing it represents.

@dvoet
Copy link
Contributor Author

dvoet commented Apr 29, 2019

There are 2 concerns that these URIs address:

  1. that the id is a DRS id and not just some arbitrary http site or string of characters
  2. durability by being independent from the version of the DRS api itself

Consumers will have to do something to address these concerns and it would be nice if it were the same thing so I will be sad to see guidance on the matter be pulled.

On the more fundamental problem, I agree that from a consumer's perspective it is not great that I must know a priori that an id refers to a bundle or object so I would be happy to see that distinction disappear from the api paths. Whatever response I get can (and does) tell me what kind of entity it represented.

@dglazer
Copy link
Member

dglazer commented Apr 29, 2019

Good comments @dvoet -- I see at least three approaches we could take here:
a) always add a qualifier to an ID when you pass it around (e.g. drs://<server>/<b|o>/<id)
b) mandate the type be part of the ID string (e.g. o.314159 and b.271828)
c) have one access method for all IDs, where the response is (logically) a union of types

The default is (a), and it's my least favorite. I could see either (b) or (c) working -- curious what others think.

@susheel
Copy link
Member

susheel commented Apr 29, 2019

@dvoet @dglazer @briandoconnor
We were planning on directly using the full native DRS API scheme:
drs://example.com/ga4gh/drs/v1/objects/65a3e501-7448-4887-aada-b52de5c8db31 for objects
drs://example.com/ga4gh/drs/v1/bundles/5007fc21-6430-4fb9-82b3-520c60e26751 for bundles

A Bundle's contents list of BundleObjects uses this scheme. See DRS v0.1 line 438

@dglazer
Copy link
Member

dglazer commented Apr 29, 2019

After looking a bit more closely, and thinking about where the Bundle definition settled and about earlier conversations with @susheel , I now lean towards (c). The API changes could be pretty straightforward:

  • make Object.access_methods optional (but document that it's required for Objects)
  • add an optional Object.contents field (but document that it's required for Bundles)
  • delete the Bundle object and the /bundles method

@susheel , re the example you list in BundleObject.drs_uri -- yes, that's one of the reasons we need to nail down this conversation. If we went with the proposals here, I think that would look like:
drs://example.com/65a3e501-7448-4887-aada-b52de5c8db31 for both objects and bundles.

Let's talk about the tradeoffs in the breakouts this afternoon.

@susheel
Copy link
Member

susheel commented Apr 29, 2019

👍@dglazer This unification is what I've been wanting for a while :) Our conversation about access_methods for Bundles, was part of this. Ideally, it would better after we upgrade to OAPI 3.0, as the OAPI spec helps to achieve these complex object specifications.

@dvoet
Copy link
Contributor Author

dvoet commented Apr 29, 2019

@susheel the reason we don't want to use the full native DRS API scheme as you describe is that we will be storing these references, probably for years. In that time I hope there will be revisions to this api spec. When that happens the version number embedded in the URL will change as well. I want to avoid the situation where we need to update numerous URLs in our databases to the latest version or force providers to retain support for the v1 spec.

@susheel
Copy link
Member

susheel commented Apr 29, 2019

@dvoet Our API version release plan was to 301 redirects to the latest compatible version. However, I completely understand your concern, so I'm hoping for an Object / Bundle unification to unify this.

@dglazer
Copy link
Member

dglazer commented Apr 30, 2019

See PR #258 for an almost-baked implementation of Object/Bundle unification. Please add any comments on unification there.

I just created Issue #259 to discuss URI convention, assuming unification has landed. I'll move over the relevant bits of this discussion soon, and will close this issue then.

@dglazer
Copy link
Member

dglazer commented Apr 30, 2019

#259 is now up to date; closing.

@dglazer dglazer closed this as completed Apr 30, 2019
dglazer added a commit that referenced this issue May 31, 2019
* fold Bundle into Object

* try new way to describe blobs vs. bundles

* require 1 checksum; more doc cleanup

* Object/Bundle unification

* minitems is lower case; more doc cleanup

* trying to find a typo

* fixed typo (finally)

* doc tweaks

* line-wrapping for checksum doc

* wrestling with line breaks

* more whitespace wrangling

* renamed BundleObject to ContentsObject
jaeddy added a commit that referenced this issue Jun 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants