Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we allow ark and other GUIDs as ID? How does Alias lookup work? #266

Closed
briandoconnor opened this issue May 6, 2019 · 12 comments
Closed
Assignees
Labels
Milestone

Comments

@briandoconnor
Copy link
Contributor

Can I get your help to refine this?

@jaeddy
Copy link
Member

jaeddy commented May 16, 2019

I feel like I'm dipping my toe into a monstrous whirlpool here, but just my two cents:

If a data repository has an API, that API can use whatever convention it wants to create identifiers (e.g., GUIDs, ark, etc.). However, to support DRS, I feel that can and should be a separate endpoint from the native access API. For example, Synapse mints [relatively] unique identifiers for all entities (e.g., syn12345); however, I wouldn't expect Synapse support for DRS to result in a URI that looks like "drs://synapse.org/syn12345". We could in theory, but given that our platform doesn't use DRS as the access protocol, additional engineering would be required to implement the /drs endpoint. If this engineering happens, then creating a hash to map drs IDs to syn IDs would require trivial additional effort.

I think it's fair to assume that most implementations of DRS will involve some additional layer/adapter on top of an existing repository — not represent the design on which the repository (and it's API) was built. As such, if the community-standard API is more useful by requiring a certain format for ideas, then I think it's fair for implementers to do a little bit of extra work to conform to those requirements.

On the other hand, if we want to specify a service that resolves GUIDs into drs IDs, then great — but that's a separate service. DRS, as defined today, is about access and retrieval (as well as portability), not discovery or persistence. We don't need to boil the ocean with a single API.

@sarpera
Copy link
Contributor

sarpera commented May 16, 2019

Agree with @jaeddy.

The title of this issue might be a bit misleading though. Re-using an existing identifier (like a GUID) as DRS ID wouldn't cause an issue if we could guarantee that the API paths would never get affected by the syntax of those IDs. DRS API should define the syntax rules for the ID property (allowed chars, max-length etc).

GUIDs with forwards slashes as DRS IDs are one of the main issues here (see my comment on Should we allow slashes in DRS IDs? #263).

If there are enough use cases for DRS to support fetching an object/bundle by a secondary identifier (unique value), I do believe in added value of a DRS server supporting look-ups for GET requests. I imagine those look-ups being based on properties of an object/bundle that are also unique to a DRS server like the ID property, so that the GET response is always a single object/bundle or no response at all. Checksum values, aliases (GUID and alike) may be candidates for this.
Something like:
drs://<server>/?alias=<GIUD>
drs://<server>/?checksum=<checksum_value>
would be equivalent of drs://<server>/<ID>

@tetron
Copy link

tetron commented May 16, 2019

I feel like I'm dipping my toe into a monstrous whirlpool here, but just my two cents:

If a data repository has an API, that API can use whatever convention it wants to create identifiers (e.g., GUIDs, ark, etc.). However, to support DRS, I feel that can and should be a separate endpoint from the native access API. For example, Synapse mints [relatively] unique identifiers for all entities (e.g., syn12345); however, I wouldn't expect Synapse support for DRS to result in a URI that looks like "drs://synapse.org/syn12345". We could in theory, but given that our platform doesn't use DRS as the access protocol, additional engineering would be required to implement the /drs endpoint. If this engineering happens, then creating a hash to map drs IDs to syn IDs would require trivial additional effort.

I'm a little confused by this, because I thought the entire premise of DRS was to be able to refer to data objects using drs:// URLs embedding native IDs minted by the platform, and the protocol provides a way to negotiate data access.

@geoffjentry
Copy link
Contributor

I thought the entire premise of DRS

IMO one thing that's been true from the start is that there's been no consensus on what the entire premise of DRS is. I've seen, and have stated myself, a number of "the whole premise of DRS is X" statements. At the moment I'm still unsure if people are generally describing the same concept via different lenses or are describing different concepts altogether.

FWIW I agree with @jaeddy here. I don't see why DRS needs to cleanly fit on top of existing repositories without any translation layer. As an example, the work we did towards building WES support in Cromwell worked that way, and presumably something similar for you @tetron regarding Arvados. We should be careful to not make things untranslatable but I don't think we should be ensuring a zero effort liftover for would be implementers.

@tetron
Copy link

tetron commented May 16, 2019

I thought the entire premise of DRS

IMO one thing that's been true from the start is that there's been no consensus on what the entire premise of DRS is. I've seen, and have stated myself, a number of "the whole premise of DRS is X" statements. At the moment I'm still unsure if people are generally describing the same concept via different lenses or are describing different concepts altogether.

Fair enough! My view has always been that that DRS is supposed to be a way to broker between existing systems of identifiers by adding namespaces and a standardized access method. The only thing that a drs:// URI means is that a particular DRS server can give you additional information about particular identifier.

FWIW I agree with @jaeddy here. I don't see why DRS needs to cleanly fit on top of existing repositories without any translation layer. As an example, the work we did towards building WES support in Cromwell worked that way, and presumably something similar for you @tetron regarding Arvados. We should be careful to not make things untranslatable but I don't think we should be ensuring a zero effort liftover for would be implementers.

Of course most systems will need an adapter layer, which is why I was confused by @jaeddy's comment that seems to suggest in the first paragraph that "additional engineering would be required" is a problem, but then in the second paragraph suggests that an adapter layer (which would be additional engineering work?) is obviously necessary.

On the original topic of alias: I would say that it should be possible for drs://host/obj1, drs://host/obj2 and drs://host/obj3 all refer to the same content, they could list the other ones as alias:

obj1 aliases: [obj2, obj3]

obj2 aliases: [obj1, obj3]

obj3 aliases: [obj1, obj2]

The question is whether to have (and/or require) a single canonical identifier.

@geoffjentry
Copy link
Contributor

Fair enough!

Just to be clear I wasn't picking on you :) It just feels to me that this is why we've spent a number of months all talking past each other on some of these finer details

The question is whether to have (and/or require) a single canonical identifier.

IMO that shouldn't be necessary. Under the hood there's a bucket o' bytes (tm) and there are pointer(s) to that bucket. I don't know that I'd support it being presented as best practice, but I don't see a problem with it.

@jaeddy
Copy link
Member

jaeddy commented May 17, 2019

Sorry for the confusion @tetron! I wasn't trying to argue that requiring additional engineering would be a problem — I think it should be the general expectation. My concern is that some changes to the DRS rules seem to be driven by cases where a Driver project has implemented the API directly within their repository system, and conforming to the current spec would require additional work for them. I think we should be OK saying that "if your repository API doesn't match the DRS spec, it's not DRS" and asking for that additional work — which will almost certainly be the case for most other DRS implementors moving forward.

On the other hand, I recognize that we're supposed to let the Driver projects drive... so not entirely sure how to handle the tension.

@ddietterich
Copy link
Contributor

As I understand the spec, an object can have multiple DRS object ids, any of which can be used for a DRS lookup.

It seems to me that any existing id can be used as the DRS object id if it is encoded to accommodate the DRS id constraints. I don't think we are setting up some unworkable hardship on existing systems any more than URL-encoding does.

The outstanding issue is the handling of forward slashes in ids. We cannot both allow ids with forward slashes in the id syntax and also have an API like:

GET object/:id/access

If we want to hew to the RESTful conventions, we would disallow forward slash in our ids. If we want to go against the grain, then we need to rearrange that endpoint. I don't have a strong position on that choice.

@sarpera
Copy link
Contributor

sarpera commented May 22, 2019

I agree with @ddietterich. I guess we are overlooking the 'RESTful'ness aspect of it; both for IDs containing slashes and #252 (object/bundle unification). No slashes would be allowed in IDs and one API path would correspond to only one type of entity to follow the convention (no unification).

Then again, one would argue DRS doesn't claim to be RESTful.

@rishidev rishidev modified the milestones: DRS v1.0, DRS v1.1 Jun 18, 2019
@ianfore ianfore added the Function:IDs Related to ID and prefix functionality label Jul 29, 2020
@ianfore
Copy link

ianfore commented Jul 29, 2020

Much has happened since this was originally discussed. Could be moot at this point? Perhaps worth working this through in terms of how compact identifiers could be used. Taking James original use case at the start of the issue SAGE could register a prefix with identifiers.org/n2t.net. This would be independent of DRS. Say sage: were registered as the prefix then sage:syn12345 would be resolved by the meta-resolvers and redirected to SAGE's registered end point - whether it were DRS or something else.

@14159012
Copy link

لقد حدث الكثير منذ مناقشته في الأصل. يمكن أن يكون موضع جدل في هذه المرحلة؟ ربما يستحق العمل من خلال كيفية استخدام المعرفات المدمجة. يمكن أن يؤدي تسجيل حالة استخدام جيمس الأصلية في بداية المشكلة إلى تسجيل بادئة مع identifiers.org/n2t.net. سيكون هذا مستقلاً عن DRS. قل Sage: تم تسجيلها كبادئة ثم Sage: سيتم حل syn12345 من قبل المحلل الفوقية وإعادة توجيهها إلى نقطة النهاية المسجلة لـ SAGE - سواء كانت DRS أو أي شيء آخر.

ianfore

@briandoconnor
Copy link
Contributor Author

I believe this has been resolved but please re-open if I'm wrong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants