-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Community definition & approval of tool information standard #77
Comments
The standards checklist is missing the following attributes:
In the checklist I would change "Persistent human readable URL" to also contain "homepage". The word homepage offers a clearer picture of the goal of this attribute. Furthermore, in table below we mention it as Homepage. In my opinion, Scientific input/output formats are by far the hardest properties to annotate, they take a lot of time to find. I would at least move the formats to gold standard, if not data too, and move documentation and / or license to silver standard (or some other attributes). |
In clarification, we need to be selective about including attributes in the standard (there are over 50 defined in biotoolsSchema) e.g. I don't see the case for including I'd say What other types of link are you suggesting to include? I agree re data & format being hard to supply, but then these are high value. When deciding to include at "silver" or "gold" we have to bear in mind we'll be setting the curation priority: good to hear what others think. |
Data types and data formats are of course both important. The type provide context for the scientific operation. The current proposal draws the line between basic and silver between operation and data types. The data types are more abstract, like the operation, whereas the data formats are more concrete. Perhaps I would feel equally comfortable drawing the line between data type and a list of supported data formats. As you say, these are much more time consuming to supply (unless you are the developer). Often one would have to install and test the software with files of different formats and see what works, as file format support may not even be completely described in the software documentation. I agree with the order of the requirement and the cut between silver and gold. The only change I would consider would be to add data types to the basic standard, as long as annotators are still allowed to save an entry before reaching the "basic" level. Or is the idea to formally require at least the basic level? |
Nice distinction!
+1 to Hans' suggestions to add more attributes such as collection and
operating system. Operating system, language and download (download type)
could go into silver.
Data and Format are difficult to annotate but on the other side are very
important to accurately describe a software. We therefore need to push to
get them into bio.tools as much as possible, leaving them either at silver
or even moving Data to basic.
2017-05-26 14:14 GMT+02:00 Magnus Palmblad <[email protected]>:
… Data types and data formats are of course both important. The type provide
context for the scientific operation. The current proposal draws the line
between basic and silver between operation and data types. The data types
are more abstract, like the operation, whereas the data formats are more
concrete. Perhaps I would feel equally comfortable drawing the line between
data type and a list of supported data formats. As you say, these are much
more time consuming to supply (unless you are the developer). Often one
would have to install and test the software with files of different formats
and see what works, as file format support may not even be completely
described in the software documentation.
I agree with the order of the requirement and the cut between silver and
gold. The only change I would consider would be to add data types to the
basic standard, as long as annotators are still allowed to save an entry
before reaching the "basic" level. Or is the idea to formally require at
least the basic level?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#77 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/APEZhZQZydKAIqpuH3KbMF2cWHWxCOlFks5r9sIqgaJpZM4NmSyw>
.
--
\|||/
(o o)
----ooO-(_)-Ooo----
Don't worry about life; you're not going to survive it anyway.
http://computproteomics.bmb.sdu.dk
|
Regardless if we want to add more attributes to any of the standards, it would be good to have a place where we mention (and possibly describe) all of them. I've looked at: As a UI related point, it would be nice at registration (or validation) time and in the search results to show (e.g. using icons) to show the information standard level of each tool. My opinion on EDAM data and formats is that they take around 50% of the entire annotation process, just because they are hard to get right unless you're the developer. I believe that the "basic" standard should be a goal fairly straightforward, but mandatory to reach. Another point I want to make is that, if one reaches "silver" standard by annotating all the required "silver" standard attributes, getting to gold is a trivial task as one would only add a handful of attributes which are fairly easy to find. Whether the hard jump should be from "basic" to "silver" or from "silver" to "gold", I am unsure. |
@joncison There are some |
Thanks everyone! Already we have some excellent inputs, for now (until we hear from others) I'll just pick up on your questions: :: "Or is the idea to formally require at least the basic level?" :: "Regardless if we want to add more attributes to any of the standards, it would be good to have a place where we mention (and possibly describe) all of them." |
oh, I'm not sure about including thanks Hans for spotting '>>' (fixed) |
The whole page looks fine to me, but there is a degree of ambiguity between two names used for the same thing: software type and tool type. It would probably be more consistent to use one name, otherwise people might think these are two distinct informations. |
Thanks ... will fix. Bearing in mind the standards (minimum, silver, gold) will set curation priorities, are the right attributes listed at the right level? Are attributes defined in the schema (including enum options) which should be in the standard, but which are not currently listed? cc @matuskalas also for his inputs. |
I've given it some thoughts and here they are, in no particular order:
|
Thanks, some really good points there. For now (until we hear from others a bit more) I'll just pick up on a few points: :: "We should rethink what the exact purpose of this is, and design it to fit this purpose. "
I agree it would be really nice to combine it with a LinkedIn-style completeness; we have 3 levels currently (actually 4 if you include "sub-minimum") and we could add 1 more; this would yield 5 levels or 5 stars. In this case, we could replace "minimum", "silver", and "gold" with simple a 5-star rating:
Completely agree with quality requirements; which we can develop once we settle the basic standard / attributes. It could also include things like "EDAM annotations cannot be top-level Cool URLs / IDs are important! (at least important enough to be mindful there is no grot). The requirement could simply be "tool ID (from name) has been manually inspected" (i.e. what we've already been discussing / doing for existing entries) |
Quick update from discussion with @ekry :
|
My thoughts on Gold standard exactly. Instead of Repository, Mailing list, Issue tracker which some tools even don't have, I would include more useful fields (from a point of view of a bio.tools user) like Operating systems, Cost/Accessibility and Maturity. These can be provided for all tools and to me are even more useful for searching, one of the main purposes of bio.tools (tools for linux vs. tools with mailing list?). Data formats are really difficult to annotate (as from our CZ experience as curators), so I would move them to Gold; Data types (I/O) seem to be fine in Silver (or some ?-star equivalent). |
@ekry & I spoke about this in person, conclusion was we need to clarify what's meant by '?', tick and cross and in respect to the standard. e.g. for issue tracker
So e.g. "gold standard" could allow a cross or a tick, but not '?' (the implication is the bio.tools UI would need to support positive statements that such-and-such attribute is not available) In general, I tend to agree : the standard should reflect annotation quality rather than tool or project "quality" (although of course the first can give an indication of the second two) |
I agree with pretty much all what @ekry wrote 👍 @joncison, could we please put https://github.com/bio-tools/biotoolsSchemaDocs/blob/master/information_requirement.rst#information-requirement into a googleDoc instead, so that people can comment to concrete points and suggest additions or edits directly? This discussion became nonviable, as the issue is very multi-faceted. |
OK, I'll try to do that once I make a revision based on entire thread above, but first I want to hear from EE guys (cc @AhTo123 !) and the SIB guys |
First most obvious 2 criticisms:
Sorry for sounding mostly negative, but these are the issues that need to be fixed before going out with it, in order not to make the bio.tools project (& us) look insane. Otherwise I'm super positive to the idea of developing an info standard, but it must be done properly. |
The idea for the (levels in the) standard was not assign a score, rather to list definite attributes that have to be specified (or alternatively, to have registered a definite statement that such a thing is not available). Bear in mind a key purpose is to guide the curators hand, such that they know what attributes they must specify when improving an entry. Scoring an entry e.g. to get to a LinkedIn-style entry completeness, could be a nice complement (and would need attribute weights). Agreed - the standard must respect the tool types (we had to start somewhere): it would be good to get a list of key attributes in each case - to use also in the emerging Curators Guide (http://biotools.readthedocs.io/en/latest/curators_guide.html) |
I'd like to propose that we include an attribute called ”version history” that would provide the following information: This attribute - version history - should be provided in order to be assigned 'gold standard' |
Severine says .. "for me there is a confusion between the quality of the annotation and the quality of the resource itself. In other words, I can spend hours describing a resource and end up with a very high quality annotation, but, if, for whatever reason, the resource provider did not set up a mailing list, then my annotation will only be considered as "Silver". Jon says ... "this underlines a point made above and it will be addressed in the next iteration. Quite simply, we can insist on such things as “Issue tracker” being annotated, but only if in fact they are available. The implication is we need bio.tools UI to support positive statements that such-and-such a thing does not exist." |
:: "year of 1st version of entry " :: "year of current version of entry" The existing In addition, the plan indeed is to allow association of a version ID with a publication ID (i.e. treat these as |
UPDATE An iteration of the proposed standard, where I try to capture all points above and other conversations: In summary, "gold" etc. replaced by 5 star rating, (where 0 stars == sub-minimal):
I'm not sure I really like dropping Operation but it's food for thought ... Crucially, attributes are only required if they are, in fact, available, the implication being ( @ekry !) we need to support positive statements in bio.tools UI that e.g. an issue tracker is not available. There are also type-specific requirements. There's an open question as to whether manual inspection / validation of (all? specific?) attributes at (all? specific?) tiers is needed : my vote is prob. that to get any star rating at all, inspection by bio.tools admin (or other trusted party) is needed, I don't see any other reliable way of assuring quality in every case ... thoughts? Great to now have more comments from everyone on this version ... PS. comments on it are welcome in a separate thread: |
I find it strange that "Operating system" and "Language" (I guess this means "programming language", not the human language of the interface) are at the "gold" level. These are easy fo fill informations, and of primary importance for a potential user to know if the tool is usable for them (at least for stand-alone tools). In addition, some tools are distributed in form of libraries (e.g. python, R library). Should "operating system" and "language" not be at the "minimal" level (one star) at least for For stand-alone tools at least (command-line or with specific GUI) and for libraries ? Cheers Jacques |
Hi, From text mining perspective, language is crucial but it refers to the language of input texts (users need to know if a given tool is for English or French texts). Another important requirement is the possibility to have sample input/output this allows (i) a quick execution for testing purposes and (ii) an exact knowledge of what the tool does. I know these things are not in the schema.... |
I think footnotes 6 and 7 might be a little bit too specific:
The second point I want to mention is that, where we have attributes that might not apply to all tools, we need to have a placeholder annotation (e.g. N/A) to infer that the attribute has been inspected and marked as not applicable , rather than just missing information. The final point is that, technically, a tool can jump from 2 stars to 5 stars by just adding Cost, Accessibility and Maturity because all the others are 'if applicable' conditioned. I don't know if this is a big problem, but it might be a way in which one can 'exploit the system' to reach 5 stars. |
A few notes from dev meeting this morning (please @hansioan @ekry @Baileqi @AhTo123 fill in any gaps):
I think I favour option 2 but would like to hear what others think
|
Hi! I hope I'm not too late to the party, but I'd personally like to also see the container and CWL fields at least in "gold" - apart from automated benchmarking I think that these things, especially the availability of a downloadable container, can be of interest for many users. "Source code" is nice, but the way from "I have the source" to "I can run it" can be long at times. In general, I'm not a huge fan of giving a "star" rating based on simply the number of additional attributes that have been provided, since that does not really give, at a glance, a feeling for what kind of metadata is available - and that, for me, should be the goal of this kind of rating. I cannot imagine a use-case where I am looking for tools that have "at least 5 metadata-fields filled out". An alternative suggestion: There could be, instead of stars or cherries, groups of metadata fields. Examples could be "Code availability" (this could be the fields "Repository", "Source code", "License"), "Documentation" (e.g. "API doc/spec", "General documentation", "Supported data format"), "Community" (e.g. "Issue tracker", "Mailing list"). And whenever all fields in a group are filled out, the tool gets the associated icon (or maybe if only one is filled, it gets the bronze version of the icon, if more the silver version, and if all the gold version). In that way, the completeness of metadata concerning specific topics of interest would be immediately visible, and there would still be a kind of "x stars" rating in the way of having a certain number of icons, but it would not try to sort these into tiers, working around the problem that for misc attributes, it's arbitrary at what tier they should be listed. So you could kind of have "star 4" without first having to have "star 2" and "star 3". Just my two cents ;) |
Not at all, inputs appreciated! I really like the idea of splitting out the "misc attributes" into thematic groupings - esp. because it chimes with the idea of metrics reflecting project maturity also. If we did that, it would leave the information standard with core attributes that are mandatory at different tiers. |
Agree as such a grouping could help to deal with the overlap between metric and information measures. With a similar system for metric and information standard, an interested person then can decide by their own which grouping they consider relevant for their assessment. |
UPDATEA new iteration is available: Highlights
Response to specific points:: I find it strange that "Operating system" and "Language" (I guess this means "programming language", not the human language of the interface) are at the "gold" level. :: Should "operating system" and "language" not be at the "minimal" level (one star) at least for For stand-alone tools at least (command-line or with specific GUI) and for libraries ? :: From text mining perspective, language is crucial but it refers to the language of input texts (users need to know if a given tool is for English or French texts). :: Another important requirement is the possibility to have sample input/output this allows (i) a quick execution for testing purposes and (ii) an exact knowledge of what the tool does. :: The final point is that, technically, a tool can jump from 2 stars to 5 stars by just adding Cost, Accessibility and Maturity because all the others are 'if applicable' conditioned. I don't know if this is a big problem, but it might be a way in which one can 'exploit the system' to reach 5 stars. :: I think footnotes 6 and 7 might be a little bit too specific: :: The second point I want to mention is that, where we have attributes that might not apply to all tools, we need to have a placeholder annotation (e.g. N/A) to infer that the attribute has been inspected and marked as not applicable , rather than just missing information. :: * like to also see the container and CWL fields at least in "gold" - apart from automated benchmarking I think that these things, especially the availability of a downloadable container, can be of interest for many users. * :: An alternative suggestion: .... :: With a similar system for metric and information standard, an interested person then can decide by their own which grouping they consider relevant for their assessment. To do
|
how do you specify "not being available" ? |
Technically this is not settled, but it would require a tweak to biotoolsSchema, the corresponding bio.tools backend and JSON model, and the bio.tools UI. Couple of options spring to mind, rather than discuss them here I created a separate thread . Thanks! |
I think this current iteration looks good. A few points from me:
|
I think this looks really nice. I would just, in addition to the smilies, separately show whether the requirements for the groups (Documentation, Accessability etc.) are fulfilled. That way, metadata availability would not be masked by the smiley levels - e.g. in the current case, it would not be relevant/visible for the rating whether binaries and code are available if the documentation is lacking. |
I am happy with the general direction and the proposed standard draft! The priority of requirements is better than a overall percentage of total completeness, as at least some of the information required for "Good" is essential to be able to do anything at all, and other information far less critical. The ordered list also suggests a workflow for annotators. Just a minor comment on Matúš comment above that licensing information is irrelevant for web apps. This may be true for many of them, but certainly not all. Sometimes there are licenses attached to the output, e.g. ProteomicsDB or the underlying cartography in many geoparsers. Having an optional N/A in the annotation of an individual tool should be sufficient. M. |
The whole effort is really nice, and the information model is good. |
I think that publication is a mandatory entry but that entry can be 'not
available'.
However i agree that a publication may also be never available. And
specially for a pub, a 'not available' could mean 'there is none' or ' i
dont know' ( if filling entry but not tool author).
Olivier
Le jeu. 15 juin 2017 23:19, Hervé Ménager <[email protected]> a
écrit :
… The whole effort is really nice, and the information model is good.
What is the difference between "mandatory" and "okay" attributes? does one
have to specify all okay attributes to create an entry? it seems to be so,
and I'm worried that asking e.g. a publication might be an unnecessary
block to the registration of entries - some good tools have no publication,
but more importantly some might be registered with a "pending" publication.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#77 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA-gYiuCtaOMgryM1NkCn7lko-XylVFLks5sEZ-_gaJpZM4NmSyw>
.
|
Thanks folks, again, for some useful comments. For now I just pick up on: :: What is the difference between "mandatory" and "okay" attributes? does one have to specify all okay attributes to create an entry? it seems to be so, and I'm worried that asking e.g. a publication might be an unnecessary block to the registration of entries - some good tools have no publication, but more importantly some might be registered with a "pending" publication. The test for publication (indeed all attributes labelled with an asterisk) can indeed be passed if specified as "Not available", the intended meaning being the annotator is fairly sure it doesn't exist, ... we'll have to make this clear in the UI somehow, and check it's not being (mis|ab)used More generally, there's been an ongoing discussion re flexibility about the type of publication asked for. e.g. for BioConductor packages, the main BioConductor publication would be OK, An online manual might even be allowed (but then we'd have to support specification of a URL). The idea is, in the Autumn, to review what lacks a publication then make a decision. Speak to some of you at 11AM CE(S)T today. |
Pasted from the meeting chat window, 2017-06-16:
With the "Verified" stamps, we need to clearly (for end user especially) distinguish between adherence to annotation guidelines VERSUS adherence to reality i.e. validity of the information. |
My conclusions / notes from the meeting this morning (NB: these are not meeting minutes): Applicability versus availability of informationWhether an attribute is applicable to a type of tool (i.e. as indicated by asterisks in current proposal) is for bio.tools to define. Whether an attribute is available is for the editor of an entry to specify. We must assume integrity of our contributors who will not misuse this feature (but check and take remedial action as necessary) @matuskalas agreed to provide a matrix of tool types versus attributes indicating applicability, with inputs from @hansioan. This should ignore obscure edge cases: even if an attribute is not (normally) applicable, it can still be specified during registration. Date stamping "not available" annotationsIt is useful to know (certainly by bio.tools admin, and maybe users) when e.g. a publication was stated as being "Not available". This also goes for license, and probably (but less strongly) for other annotations (all annotations that can be annotated as "Not available"? all annotations?) This could help inform periodic updates of the registry. We wouldn't want to clutter the UI with such information though. Consideration of tool type informationThe matrix of tool types::attributes (applicability) - mentioned above - is necessary and sufficient for the current proposed model. It also lays the foundation for a more sophisticated, tool-type specific standard in the future which could, in principle, better reflect the specific information requirements of each type of tool. Number of tiers in model@matuskalas suggested 5 tiers is too many. My opinion is that 5 is OK, considering that the 1st ("NEEDS TO IMPROVE") I hope will not be much used, and that we need a few tiers to motivate incremental improvement of entries ("curation as a game") Stability of modelThe model will necessarily need to change (a bit at least) in years to come, but we must handle change very carefully, especially to ensure e.g. that an entry labelled today as "EXCELLENT" does not suddenly become "NEEDS TO IMPROVE" (in such cases, entries probably should be improved retrospectively before applying any new standard). Restricting potentially "breaking" changes to once per year (i.e. as per biotoolsSchema) seems sensible. Labeling entriesTo clarify two types of label will be applied:
Verification of entriesThis (c|w)ould be toxic if carefully annotated entries do not get a "VERIFIED" stamp, on the other hand, labeling is very important to identify / reward such high quality curation efforts. The verification burden should not purely be on bio.tools admin (to avoid being a blocker to this). The solution is to allow anyone (but especially bio.tools admin, Thematic Editors and other trusted curators) to verify entries. The requirement is effective tooling for this, such that a curator can tick-off guidelines that have been satisfied (we have a mock-up, but it's not yet public) A verification date is needed and also a way to indicate whether the entry has been updated since it was verified (e.g. by changing the colour of the stamp). OtherA number of other points were raised not directly related to the standard:
See other notes from the meeting |
Our only comment is whether this standard supports an operation that is performed using two different programs in sequence. For example, the code availability allows describing both programs? |
The final minutes of the WP-1 centered bio.tools status meeting on INFORMATION STANDARDS: Apologies Minutes The idea of 'revising the standards on an annual basis' is challenging Four standard tiers/labels are contemplated (OKAY, GOOD, VERY GOOD, EXCELLENT) that are all of 'acceptable' quality. A fifth label (NEEDS TO IMPROVE) is for entries which lack basic information. Each label is associated with a set of attributes. The set of attributes required to earn a label – or the list of allowed sub-domains to tick a particular attribute, could in principle be changed – if practical experience shows it would be valuable. And so we envision to revisit, with caution, the set of four (five) standards on an annual basis – with input from the community BUT - by all means, any future change in the standards must not bereave a tool of an 'earned' label, or lead to a 'greying' of an annotation void. Rather such changes should apply to future earning of labels, and be presented in the 'background guide info for curators' for verified-label tools, that now needs more annotation work. Annotation of Not applicable, None exists, Unknown, and Need Updating These terms are all valuable information, and should be carefully and individually assigned as annotation options, for all attributes. MK made the point that distinct tool types warrant a distinct set of attributes – in order to avoid numerous 'not applicable' annotation results. It was agreed that MK will draft a matrix (tool types vs attributes) that will help decide if some tool types should indeed be assigned a distinct set of attributes, and if not, at least will help capture the adequacy of annotating 'not applicable' for a given tool attribute. Annotation metrics – assessing quantitative measures on the quality input This point was made by MK and wants to assess the registry's total number of annotated information on a given attribute. Other obvious quantitative measures include amount of information (most simply number of JSON/XML nodes); last modified (time since); last new version (time since); last scientific publication (time since). This will help us monitor the overall progress on quality of the registry as a supplement to tracking number of users and number of entries (quantity). Date-stamps The annotation 'None exists' should be time stamped, because it may be relevant to update the information. Verification of labels Several arguments were made for and against a Date-stamped verified label of a given tool. In particular if we're dealing with manual verification of earning a given label: 1) this could be seen as censure, by the developers, which would counteract his/her willingness to simply supply the best possible annotation/information on a given tool, 2) it is labour-intensive and possibly old-fashioned (not Wiki-like), 3) there is a danger of the verification process is of lower quality than the annotation process itself. On the other hand, the end user may better trust a manually verified date-stamped label. We need to consider the need of developers (the best provider of info) and of the end-user (trust issue). Including the possibility for developing a machine-learning-driven autocurator (Action Piotr Chmura). It is possible that the ressources spent of verification were better spent to improve the annotation. |
UPDATEVarious minor tweaks have been made to the information standard: I think this is a “respectable beta” and implementation in bio.tools will now proceed. Replies to specific points (not already address) above follow: :: The final point is that, technically, a tool can jump from 2 stars to 5 stars by just adding Cost, Accessibility and Maturity because all the others are 'if applicable' conditioned. I don't know if this is a big problem, but it might be a way in which one can 'exploit the system' to reach 5 stars. Operation (always applicable) is now mandatory at tier 3. More generally we have to assume honesty. To get the “Verified” stamp an entry will have to be manually verified and the name of verifier should be rendered next to the stamp (along with date) - this should discourage abuse. :: how do you specify "not being available" ? :: GOOD: we should start with a positive statement before the "would benefit part". Example: The entry is annotated at a decent/good information level/standard, but .... I take the point, but there’s a limit to what can squeezed into a useful infographic and besides, this doesn’t add to what’s already said by “GOOD”, “VERY GOOD” and “EXCELLENT. :: I think this looks really nice. I would just, in addition to the smilies, separately show whether the requirements for the groups (Documentation, Accessability etc.) are fulfilled. That way, metadata availability would not be masked by the smiley levels - e.g. in the current case, it would not be relevant/visible for the rating whether binaries and code are available if the documentation is lacking. This needs more thought, and we should try to define what metrics / labels would be useful including capturing some things from the groups mentioned. Let’s take this discussion elsewhere, e.g.
:: What is the difference between "mandatory" and "okay" attributes? does one have to specify all okay attributes to create an entry? it seems to be so, and I'm worried that asking e.g. a publication might be an unnecessary block to the registration of entries - some good tools have no publication, but more importantly some might be registered with a "pending" publication. We’ll need to settle two things; i.e. what level in the standard is required:
i.e. not all entries will be visible, at least by default (possibly set by users) Publication is marked as “if applicable and available”, i.e. a tool can be registered without one but the user will need to specify explicitly it is not available. :: However i agree that a publication may also be never available. And specially for a pub, a 'not available' could mean 'there is none' or ' i dont know' ( if filling entry but not tool author). We’ll need to spell out (in the UI, everywhere) that “Not available” means “to the best of my knowledge the tool has not been published” A footnote: in the next release ~95% of entries will have at least one publication ID (often a overarching publication, e.g. for BioConductor packages the main BioConductor paper is OK) :: Our only comment is whether this standard supports an operation that is performed using two different programs in sequence. For example, the code availability allows describing both programs? This is more an issue of modelling in biotoolsSchema (https://github.com/bio-tools/biotoolsschema). In the case of workflows, I’m not sure that the “Source code” attribute would be applicable. :: Annotation of Not applicable, None exists, Unknown, and Need Updating. These terms are all valuable information, and should be carefully and individually assigned as annotation options, for all attributes. Not exactly.
:: Several arguments were made for and against a Date-stamped verified label of a given tool... . I’m not sure the arguments there hold water, given that 1) anyone will be able to verify entries and 2) without some verification process, the standard could only ever describe entry completeness and not quality. |
Heinz says ... "event the "excellent" label is only "quite" good. Gives the impression that the work is never finished :-) It needs to be carefully communicated so users of the portal don't confuse labelling of the entry in bio.tools with quality of the resource/tool that is annotated. An "OKAY"-labeled resource can very well be a perfect resource ... users usually care more about the latter" |
In light of discussions at DKBC conference in DK (Aug) I made a revision (https://github.com/bio-tools/biotoolsSchemaDocs/blob/master/information_requirement.rst)
This is sufficient to calculate "entry completeness" on bio.tools as it stands today (cc @hansioan, @ekry @hhbj ) Also agreed to take a shot at per-attribute standards, e.g. what constitutes a "GOOD" description etc. This is a work in progress. |
Folks, another version now at https://github.com/bio-tools/biotoolsSchemaDocs/blob/master/information_requirement.rst This time providing major clarifications around possibilities for labelling, separating out the notions of "entry completeness", "valid syntax", "manually inspected" and "conformance to guidelines". The tiers now reflect an aggregation of these things (look and you'll see) |
UPDATEFollowing various discussions we can now settle a respectable first beta version for the tool information standard (https://github.com/bio-tools/biotoolsSchemaDocs/blob/master/information_requirement.rst). Small but significant changes include:
In the 1st instance, it will be applied to bio.tools in two ways:
Whether / how to label entries can wait for later. biotoolsSchema (https://github.com/bio-tools/biotoolsschema) will be revised to constrain at the level of "SPARSE" (#88) Good to get some feedback, because there have been many changes in the last few months. |
UPDATEVarious improvements to make the standard more simple and more usable:
The big change is that "not available-ness" of attributes is now much more tractable (see #82) and formal specification of "applicable-ness" is simply no longer required. cc @hansioan (but everyone please) comment ... |
Dear @hhbj, @cossorzano, @matuskalas, @osallou, @magnuspalmblad, @dabrowskiw, @hansioan, @martavillegas, @jvanheld, @krab1k, @ekry Hans et al have been very busy improving the bio.tools content according to our "respectable beta" information standard for tools: Next month, I plan to start writing up the work on the definition of the standard, it's application to bio.tools and the broader context as a little publication, possibly in Briefings in Bioinformatics or maybe F1000. If you'd like to be involved with that publication, please email me now ([email protected]) and I'll provide a link to the google document for the article. For now, I'm closing this issue. Thanks for all the work thus far! |
As part of a drive to improve the quality of content in https://bio.tools, we now have a candidate information standard, which defines the attributes that must be defined for an entry to be nominated (possibly eventually labelled) as of "minimal", "silver" or "gold" standard quality. The standard is based upon biotoolsSchema (https://github.com/bio-tools/biotoolsschema) and will underpin bio.tools quality metrics and KPIs (under development), guiding all future curation work.
See:
https://github.com/bio-tools/biotoolsSchemaDocs/blob/master/information_requirement.rst#information-requirement
Please can I get some feedback here?
The text was updated successfully, but these errors were encountered: