Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bio.Tools metrics: quality, quantity, progress, and contribution indicators #113

Open
matuskalas opened this issue Oct 25, 2016 · 10 comments
Assignees
Labels
complex feature request We expect this will be hard to do. metrics and meta-registry Concerns metadata calculated from bio.tools content, e.g. metrics.

Comments

@matuskalas
Copy link
Member

matuskalas commented Oct 25, 2016

As a preamble to this topic, a great example: https://bio.tools/tool/Galaxy/version/none 👎

Here comes a list of metrics/indicators that are EXTREMELY EASY TO IMPLEMENT, while at the same time excellent indicators of quality, quantity, contribution, and progress.

Notes: The SIMPLEST and most relevant indicators are in bold. The rest are additional that are similarly SIMPLE and relevant, but less general (more specific). All the following has been mentioned and discussed regularly since the EMBRACE Registry times, repeatedly in various meetings in Amsterdam and Lyngby, including Kristoffer, @ekry, @joncison, @hmenager, Łukasz, me, Manchester folks, and Gert.

A. Basic (=) quantity metrics

1. # of attributes (nodes or leaves in the JSON/XML tree; summed over all entries)
2. # of operations (with at least one EDAM data concept ≠ 0006, as input or output, and at least one EDAM operation concept ≠ 0004; summed over all entries)
3. # of entries

These 3 should certainly be shown also on the top of the Bio.Tools "home page", and then also on the top of each list/table of search results (then of course per the found entries).

B. Community (=) contribution metrics

4. # of updates of an entry (summed over all entries)
5. # of individual registrants/curators (especially nice after the anonymous registrant groups a.k.a. "affiliations" are split into real users)
6. # of registrant/curator institutions
7. # of authors/developers/contributors in Credits
8. # of institutions in Credits
9. # of publications (with distinct DOIs)
10. # of public repositories (GitHub etc.), and similar useful & non-mandatory attributes

C. Quality metrics (for the whole registry)

11. 1. ÷ 3. -- i.e. # of attributes per # of entries
12. 2. ÷ 3. -- i.e. # of operations per # of entries
13. 4. ÷ 3. -- i.e. # of all entry updates per # of entries
14. 5. ÷ 3. -- i.e. # of registrants/curators per # of entries

15. 7. ÷ 3. -- i.e. # of authors/developers/contributors per # of entries (possibly etc. with 6., 8. - 10.)

D. Progress visualisation

  • The growth of ALL THE METRICS ABOVE over time (especially 1. - 5. and 11. - 14.; with per-day resolution)
  • Note: A separate report should be published where the above growth curves are plotted on the time-line together with hackathons' and workshops' dates marked.

Note:

All the indicators A. - D. can also be internally (within ELIXIR-EXCELERATE WP1) reported PER PARTNER plus per "the rest of the contributors (i.e. non-EL-EX-WP1)". The only required dependency is to first manually split all registrants into the "outreach and support spheres" per EL-EX-WP1 partner. Some registrants can fall under multiple partners, e.g. all de.NBI ones are supported by DK+NO+FR.

E. Quality metrics (for one entry)

  • The same as 1. - 2. and 4. - 10., BUT FOR THE GIVEN ENTRY
  • The same as above, BUT PER CURRENT AVERAGES (i.e. per 11. - 15.)
  • Note: Both of these can be beautifully visualised with some pretty tiny icons in the entry cards/rows, and even in the future taken into account when sorting search results.
@matuskalas
Copy link
Member Author

One more note:

"2. # of operations (with at least one EDAM data concept ≠ 0006, as input or output, and at least one EDAM operation concept ≠ 0004; summed over all entries)"

means: # of DISTINCT operations within a Bio.Tools entry, where each can have multiple functions i.e. EDAM operation concepts ≠ 0004.

That leads to another SIMPLE and relevant metric 2.5:

2.5. # of functions (i.e. # of EDAM operation concepts ≠ 0004, in operations with at least one EDAM data concept ≠ 0006, as input or output. That means that useless operations without neither inputs nor outputs are ignored, just like in 2.)

Noteworthy, both 2. and 2.5 are relevant and SIMPLE, each important and motivating for good annotations separately: 2. for well-annotated tools with multiple operations (e.g. toolkits), and 2.5 for well-annotated tools with integrated functionality (e.g. workflows).

A corresponding quality metric (C.) should be added: 12.5. 2.5. ÷ 3. -- i.e. # of functions per # of entries, as well as a corresponding progress metric (D.), and a corresponding per-entry quality metric (E.).

@joncison
Copy link
Member

Very useful - thanks a million for this proposal Matus. Enhanced content reporting is in the roadmap (http://biotools.readthedocs.io/en/latest/changelog_roadmap.html) for Dec 16 and could include much of this.

ps. that more-or-less empty entry you pointed out was intentional: the BioExcel partners will be adding details in due course. We just needed to add them to bio.tools to allow a means for them to make edits. Really they should be in the "staging area" / marked as "beta" and this in the roadmap for 2017 Q1.

@joncison joncison added metrics and meta-registry Concerns metadata calculated from bio.tools content, e.g. metrics. complex feature request We expect this will be hard to do. labels Mar 13, 2017
@joncison
Copy link
Member

@matuskalas - we should def. pick up on this later in the year once other higher priority things are out the way. I label as "complex" because while each individual thing is easy enough to do, there are lots of them

@joncison
Copy link
Member

joncison commented May 3, 2017

From #25:

  • breakdown of contributions by country / top-level domain (possibility for nice visualisation)

Also stats for each annotation:
• Publications (PMID, PMCID and DOI total)
• Contacts (i.e. number of emails and/or URLs)
• Documentation links
• Download links
• License
• Operating system
• Language
• Maturity

that's the key ones right now (given current content) I think?

@joncison
Copy link
Member

joncison commented May 4, 2017

Countries / top-level domains
A plot of top-level domains would be nice, something like this, or even better having these date mapped to a world-map would be super-cool
capture

Institutes

  • Total #of contributing institutes (from analysis of user email domains)
  • breakdown (#entries / institute)

@joncison
Copy link
Member

@matuskalas - on Monday me and @ekry will finalise which of the above ideas will make it into the next revision of bio.tools/stats : do you have any more ideas to add? Thanks!

@joncison
Copy link
Member

I'd like to hear some suggestions about what metrics we could get in light of the groupings in the information standard (https://github.com/bio-tools/biotoolsSchemaDocs/blob/master/information_requirement.rst), see bio-tools/biotoolsSchema#77

Specifically aggregated metrics to capture things like project maturity & community as evidenced by things like repos, documentation, mailing list etc etc..

@joncison joncison self-assigned this Sep 13, 2018
@joncison
Copy link
Member

Additional such metrics are an issues for biotoolsLint (https://github.com/bio-tools/biotoolsLint) to calculate potentially.

@scapella
Copy link

scapella commented Sep 13, 2018 via email

@joncison
Copy link
Member

will do @scapella ... any metrics we calculate internally will be strictly in scope of what data we have in bo.tools. Stuff above is obviously only a small slice of all the different metrics we (== ELIXIR) has been considering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complex feature request We expect this will be hard to do. metrics and meta-registry Concerns metadata calculated from bio.tools content, e.g. metrics.
Projects
None yet
Development

No branches or pull requests

3 participants