-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do we allow ark and other GUIDs as ID? How does Alias lookup work? #266
Comments
I feel like I'm dipping my toe into a monstrous whirlpool here, but just my two cents: If a data repository has an API, that API can use whatever convention it wants to create identifiers (e.g., GUIDs, ark, etc.). However, to support DRS, I feel that can and should be a separate endpoint from the native access API. For example, Synapse mints [relatively] unique identifiers for all entities (e.g., I think it's fair to assume that most implementations of DRS will involve some additional layer/adapter on top of an existing repository — not represent the design on which the repository (and it's API) was built. As such, if the community-standard API is more useful by requiring a certain format for ideas, then I think it's fair for implementers to do a little bit of extra work to conform to those requirements. On the other hand, if we want to specify a service that resolves GUIDs into |
Agree with @jaeddy. The title of this issue might be a bit misleading though. Re-using an existing identifier (like a GUID) as DRS ID wouldn't cause an issue if we could guarantee that the API paths would never get affected by the syntax of those IDs. DRS API should define the syntax rules for the ID property (allowed chars, max-length etc). GUIDs with forwards slashes as DRS IDs are one of the main issues here (see my comment on Should we allow slashes in DRS IDs? #263). If there are enough use cases for DRS to support fetching an object/bundle by a secondary identifier (unique value), I do believe in added value of a DRS server supporting look-ups for GET requests. I imagine those look-ups being based on properties of an object/bundle that are also unique to a DRS server like the ID property, so that the GET response is always a single object/bundle or no response at all. Checksum values, aliases (GUID and alike) may be candidates for this. |
I'm a little confused by this, because I thought the entire premise of DRS was to be able to refer to data objects using drs:// URLs embedding native IDs minted by the platform, and the protocol provides a way to negotiate data access. |
IMO one thing that's been true from the start is that there's been no consensus on what the entire premise of DRS is. I've seen, and have stated myself, a number of "the whole premise of DRS is X" statements. At the moment I'm still unsure if people are generally describing the same concept via different lenses or are describing different concepts altogether. FWIW I agree with @jaeddy here. I don't see why DRS needs to cleanly fit on top of existing repositories without any translation layer. As an example, the work we did towards building WES support in Cromwell worked that way, and presumably something similar for you @tetron regarding Arvados. We should be careful to not make things untranslatable but I don't think we should be ensuring a zero effort liftover for would be implementers. |
Fair enough! My view has always been that that DRS is supposed to be a way to broker between existing systems of identifiers by adding namespaces and a standardized access method. The only thing that a drs:// URI means is that a particular DRS server can give you additional information about particular identifier.
Of course most systems will need an adapter layer, which is why I was confused by @jaeddy's comment that seems to suggest in the first paragraph that "additional engineering would be required" is a problem, but then in the second paragraph suggests that an adapter layer (which would be additional engineering work?) is obviously necessary. On the original topic of alias: I would say that it should be possible for drs://host/obj1, drs://host/obj2 and drs://host/obj3 all refer to the same content, they could list the other ones as alias: obj1 obj2 obj3 The question is whether to have (and/or require) a single canonical identifier. |
Just to be clear I wasn't picking on you :) It just feels to me that this is why we've spent a number of months all talking past each other on some of these finer details
IMO that shouldn't be necessary. Under the hood there's a bucket o' bytes (tm) and there are pointer(s) to that bucket. I don't know that I'd support it being presented as best practice, but I don't see a problem with it. |
Sorry for the confusion @tetron! I wasn't trying to argue that requiring additional engineering would be a problem — I think it should be the general expectation. My concern is that some changes to the DRS rules seem to be driven by cases where a Driver project has implemented the API directly within their repository system, and conforming to the current spec would require additional work for them. I think we should be OK saying that "if your repository API doesn't match the DRS spec, it's not DRS" and asking for that additional work — which will almost certainly be the case for most other DRS implementors moving forward. On the other hand, I recognize that we're supposed to let the Driver projects drive... so not entirely sure how to handle the tension. |
As I understand the spec, an object can have multiple DRS object ids, any of which can be used for a DRS lookup. It seems to me that any existing id can be used as the DRS object id if it is encoded to accommodate the DRS id constraints. I don't think we are setting up some unworkable hardship on existing systems any more than URL-encoding does. The outstanding issue is the handling of forward slashes in ids. We cannot both allow ids with forward slashes in the id syntax and also have an API like:
If we want to hew to the RESTful conventions, we would disallow forward slash in our ids. If we want to go against the grain, then we need to rearrange that endpoint. I don't have a strong position on that choice. |
I agree with @ddietterich. I guess we are overlooking the 'RESTful'ness aspect of it; both for IDs containing slashes and #252 (object/bundle unification). No slashes would be allowed in IDs and one API path would correspond to only one type of entity to follow the convention (no unification). Then again, one would argue DRS doesn't claim to be RESTful. |
Much has happened since this was originally discussed. Could be moot at this point? Perhaps worth working this through in terms of how compact identifiers could be used. Taking James original use case at the start of the issue SAGE could register a prefix with identifiers.org/n2t.net. This would be independent of DRS. Say sage: were registered as the prefix then sage:syn12345 would be resolved by the meta-resolvers and redirected to SAGE's registered end point - whether it were DRS or something else. |
ianfore |
I believe this has been resolved but please re-open if I'm wrong |
Can I get your help to refine this?
The text was updated successfully, but these errors were encountered: