Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update facets of IDs #79

Closed
joncison opened this issue Jun 3, 2017 · 13 comments
Closed

Update facets of IDs #79

joncison opened this issue Jun 3, 2017 · 13 comments

Comments

@joncison
Copy link
Member

joncison commented Jun 3, 2017

No description provided.

@joncison joncison added the bug label Jun 3, 2017
@joncison joncison self-assigned this Jun 3, 2017
@joncison
Copy link
Member Author

joncison commented Jun 4, 2017

and make CollectionID match

@joncison joncison added enhancement and removed bug labels Jun 6, 2017
@matuskalas
Copy link
Member

matuskalas commented Jun 7, 2017

As there are more updates necessary for the IDs, I'm extending this issue and listing them here:

  • Update maxlen facet on toolID to match what's currently supported in bio.tools (was the original title of this issue). Personally, I don't see a need for a formal maxlen constraint. What do others think? Or is it for "anti-spam" purposes? I personally don't have anything against a maxlen constraint either, as long as it is a round number (i.e. 32 or 64, but not 50 or 100 ;-D). 16 is too short, so 32 is my vote.

  • Making a BiotoolsID simpleType in the XSD, used for tools, collections, etc.

  • Make all IDs subtype of xs:NCName, with a matching pattern. Otherwise they can't be used in RDF/XML!! A good pattern for bio.tools that matches NCName would be something like [_a-z][_\-.0-9a-z]{,31} or [_a-zA-Z][_\-.0-9a-zA-Z]{,31} (see next one). One may consider some non-English characters such as é, but rather not! (They're fine in names, but not an ID! We don't want both BLAST and BLÁST, or Protégé and Protége and Protegé and Protege. Unless they're handled in the same way as the next point suggests about capitals.)

  • I don't mind if capital letters are allowed (they're pretty), but both the http resolution and the uniqueness must work case-insensitively! E.g. http://bio.tools/pigNaLc should resolve|http-rewrite|redirect to http://bio.tools/PignalC . Is this how it works in bio.tools, @ekry ? If yes, awesome!!! If not, it should! ;-)

  • Having /tool or /collection in the URI is undesirable for many reasons:

@matuskalas matuskalas changed the title Update maxlen facet on toolID to match what's currently supported in bio.tools Update facets of IDs Jun 7, 2017
@joncison
Copy link
Member Author

joncison commented Jun 7, 2017

  • maxlen facet : I think we'll know a sensible value once a big clean-up of existing IDs is complete (@hans & I are busy with this). The idea is it should be big enough to allow all sensible names (but not crazy big)

  • there is currently biotooldIType simpleType (used in versionID and toolID) and biotoolsUrlType (used for relation->biotoolsId) which includes valid bio.tools domains. Probably the later type is not needed (just use ID w/o URL bit ?)

  • bio.tools Tool Card URLs do resolve in case-insensitive way

  • the plan indeed is to support bio.tools/tooName

  • as for collections (and other things we may want to provide information / endpoints for ), I'm not sure yet, distinct namespaces could be useful if there are several types of thing. I see no harm in supporting bio.tools/tool/toolName for tool info, bio.tools/collection/collectionName etc. given we'll support the "cool" style above.

PS. handling collections (which are currently just tags) definitely needs more thought!

@joncison
Copy link
Member Author

joncison commented Nov 4, 2017

UPDATE cc @matuskalas @hansioan

The new model for identifiers is:
capture

value has no maxlen facet. As all the bio.tools toolIDs are now manually verified anyway (Hans please see bio-tools/biotoolsRegistry#279) this should be OK. It's xs:token with appropriate regex's (neither xs:QName or xs:NCName` would work).

The maxLen facet on biotoolsIdType (used in relation->biotoolsId is removed.

bio.tools URLs are now of the simpler form and, yes e.g. https://bio.tools/SiGnalp resolves

@matuskalas
Copy link
Member

matuskalas commented Nov 9, 2017

Could you please tick off the boxes of #79 (comment) according to which are done?

To my understanding:

  • the 1st is done by removing the maxLength, i.e. no length constraint, what's fine. But is this also compatible with the functionality of bio.tools?
  • the last, 5th point is done for tools, which is really excellent, hooray! The XSD of biotoolsUrlType needs to be updated accordingly.
  • the 3rd point is ABSOLUTELY CRUCIAL and should be fixed. (@albangaignard: are there any additional restrictions due to other LD formats than RDF/XML?)

@joncison
Copy link
Member Author

joncison commented Nov 10, 2017

I've ticked all the boxes for things done. bio.tools will be updated - beginning next week - to what will become biotoolsSchema 3.0.0.

Except for use of NCName, which I'm not sure about. NCName preclude the use of ":", and without the prefixes (which include ":") it's not possible to disambiguate instances of different types of identifier value. That would subsequently force us to make typing (via identifier->type) of the supplied identifier values mandatory at registration time (whether via UI or API).

Rather, I think we should treat all identifiers consistently as CURIES (https://www.w3.org/TR/2010/NOTE-curie-20101216/), more specifically as having syntax prefix:value and let any application logic drop the prefix, or indeed map the CURIE to something resolvable, as required.

The implication here we need a "canonical" prefix for bio.tools. Logically, this would be "biotools" giving e.g. biotools:signalp which is trivially mapable to the (resolvable) https://bio.tools/signalp.

cc @hansioan @matuskalas @ekry

@joncison
Copy link
Member Author

Going down this route, this is what the regex looks like for identifier->value:
capture

with examples:
capture

and in this case it (neatly) allow us to not have to mandate explicit typing of the supplied value separately, thus:
capture

@joncison
Copy link
Member Author

joncison commented Nov 10, 2017

@matuskalas @hansioan @ekry - for now I will assume this is a reasonable fix, and my next commit will close this issue (can always be reopened as needed). Like Matus, I'm not 100% sure that the bio.tools ID should be under identifer or specified in it's own element. Unless someone shouts, I leave it nested, as shown above.

@matuskalas
Copy link
Member

This is all really awesome, epic developments around the bio.tools IDs (now and recently)! 👍

Please check and accept (i.e. merge) the pull request #93, which fixes also the last standing point of #79 (comment). After the merge, it will automatically and ultimately close this issue once for good! ;-)

@matuskalas
Copy link
Member

The only thing left is #79 (comment), i.e. making the collection ID (and URI) match. As there are some further considerations that need to be (under)taken, I created a separate new issue for that: #94.

joncison added a commit that referenced this issue Nov 15, 2017
bio.tools ID to be a subset of xs:NCName (fixes #79)
@joncison
Copy link
Member Author

@matuskalas, thanks for above!

Now that biotoolsID is in it's own element and gives the "vanilla ID" e.g. "signalp" (with biotoolsCURIE element for prefix'ed IDs e.g. "biotools:signalp") then we can change the type of biotoolsIdType simpleType (as used by biotoolsID) from xs:anyURI to NCName. My next commit will do this, and close this issue.

@joncison
Copy link
Member Author

I ticked the 3rd box: xs:NCName is specified for biotoolsIdType simpleType thus biotoolsID wherever it occurs.

@joncison
Copy link
Member Author

Following the discussion in bio-tools/biotoolsRegistry#284 biotoolsIdType (thus biotoolsID) is being reverted to xs:token with regex [_\-.0-9a-zA-Z]*.

but

next commit will add support for xs:NCName-compatible biotoolIDs to be specified in otherID->value, i.e. if it turns out we need such IDs, they can (will) be supported as alternative IDs. I hope this is a reasonable compromise and avoids the refactoring (bio-tools/biotoolsRegistry#285, bio-tools/biotoolsRegistry#284) and other complications that go with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants