-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Signposting support in Dataverse #5962
Comments
After some discussion with @hvdsomp at DANS (one of the authors of Signposting), we came up with the following specs for integrating Signposting in Dataverse. Please feel free to comment on this proposal and share ideas for improvement. The trick in Signposting is to add a header with key Signposting has various so-called 'patterns'. Aggregating these patterns may result in a large PatternsAuthor patternhttps://signposting.org/author/ For every author in a dataset, this will try to add one In general:
Example: https://dataverse.nl/dataset.xhtml?persistentId=hdl:10411/DCXGYS contains 4 authors with an ORCID. Hence we expect a
Identifier patternhttps://signposting.org/identifier/ This will add If the files have their own persistent identifier, the landing page should contain a similar If a dataset has multiple identifiers (e.g. DOI, Handle, etc.), an entry like the one above can be added (comma separated) for each identifier. Example (1): https://dataverse.nl/dataset.xhtml?persistentId=hdl:10411/KYCKAI has a Handle and 2 files. The files don't have a persistent identifier. Hence we expect the following
Example (2): https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/BUPJNR has a DOI and 49 files. The files have their own persistent identifier. Hence we expect the following
and we expect the following
Bibliographic patternhttps://signposting.org/bibliographic_metadata/ This will add Example:
Additional citation formats could be added if your https://dataverse.nl/dataset.xhtml?persistentId=hdl:10411/YDUA7T has 3 citation formats (EndNote, RIS en BibTex), as well as 5 (suitable) metadata export formats (Dublin Core, DDI, DataCite, OAI_ORE and Schema.org JSON-LD). Hence we expect a
Conversely, all these URLs should have their own
All these hold similar for the landing pages of every file in the dataset, but with the URLs and metadata formats for the file. Resource Type patternhttps://signposting.org/resource_type/ This will add a Example (1):
Example (2): Publication Boundary pattern (1)https://signposting.org/publication_boundary/ This will add an Example:
Conversely, on each of these file landing pages we expect a
Warning: Publication Boundary pattern (2)https://signposting.org/publication_boundary/ This will add an Example:
Conversely, on the download URL we expect a
Signposting by value vs. by referenceAs mentioned before, the number of relations proposed in the above section will cause the One approach would be to implement only a subset of these patterns and relations. For example:
An obvious drawback of this is the loss of relations that might help in the discoverability of the datasets and files. An alternative to this so-called by value implementation is the by reference implementation that uses the In the by reference implementation we provide only one relation in Example:
Please note that the URL is made up and is very much open for discussion. Following this URL will return a formatted text like:
Note that now every relation also has an One of the major advantage of the by reference implementation is that we do not have to provide just a subset of all relations. With this we can provide as many relations as we want, without worrying about the length of the We may want to consider to implement Signposting on dataset- and file landing pages by reference, whereas on all other links (such as API calls) the by value approach will suffice, since these mostly contain just one relation. |
@rvanheest thanks for the detailed write up! I guess I have a few questions:
Thanks! |
Personally I don't have much experience with the Dataverse code base yet, so I'm not sure if I'd be the one to implement this. But I'll check on this within DANS and I'll let you know about this. On the other hand, would it be a possibility for Harvard (or another organisation) to do the implementation? In that case I'd be glad to help out with any further questions or advise regarding Signposting.
I guess you could make it so that this feature can be disabled. However, I don't see the point of that. Signposting is considered to be the 'low-hanging fruit' of machine2machine discoverability in archives/repositories, so I'm wondering if there would be a special use case in which this feature is to be disabled? The only reason would be the by value issues and concerns with server configurations related to the length of the Of course you could make this as fancy as you would like it to be. In a very rich version you could imagine even having a UI panel where you can select which link relations are to be enabled/disabled.
In our implementation in EASY we did not experience any performance issues with this. Now of course a huge number of relations in the |
Harvard/IQSS has a long to do list at https://www.iq.harvard.edu/roadmap-dataverse-project so it would be great if we could identifier an external contributor who is willing to do make a pull request. I'm happy to mentor anyone to get up to speed with hacking on Dataverse. 😄
For the initial version I think all these relations should be hard coded. Later, they could be configurable.
Sorry, I meant the lookups from the Dataverse database. You're right, we could do some caching if necessary. I just wouldn't want this feature that many people haven't heard of (yet) to slow down their installation of Dataverse in any way. That's why I was suggesting a toggle to turn the feature off if it isn't desired. At point I would suggest starting a "Signposting" thread at https://groups.google.com/forum/#!forum/dataverse-community that explains the benefit and points to this issue so that more people in the Dataverse community can learn about what it is. |
Would it perhaps be more efficient/expressive to use a Link "meta" header, pointing to a metadata URI (rather than having many more precise links to subsets of metadata). |
Apologies - though the "meta" Link header is still in the W3C's blog showing the utility of Link headers, it looks like it never made it through the standardization process. It appears that it was replaced by "describedBy" (https://www.iana.org/assignments/link-relations/link-relations.xhtml) It seems that this would point to a "generic" document containing all metadata about the subject. |
Folks building a tool to access the "FAIRNESS" of datasets in Dataverse repositories (https://www.fairsfair.eu/fairsfair-data-object-assessment-metrics-request-comments) recommended that Dataverse implement signposting. Typed links are one of several ways their tools are looking for information about datasets in Dataverse-based repositories. |
Here's some more information from the FAIRsFAIR team / @kitchenprinzessin3880:
I'm currently working on a grant proposal together with DANS, Harvard and others for funding to upgrade DataverseNO, and we might consider to include Signposting implementation in Dataverse in one of the work packages. Does anyone have an idea how much resources (expressed in person months) this approximately will require? |
Ha! That draft wasn't really public yet, but, hey, I guess it is now :-) The gist of the material is there but I still am working on examples at the end of the document. Feedback to the draft is very welcome, of course! Great to hear that you might look into implementing the FAIR Signposting Profile for Dataverse. That would be super, of course. I am not a Dataverse expert at all, so it is hard for me to assess the resources required to implement, but you would be looking at:
|
@hvdsomp would you like to create a google doc version of https://signposting.org/FAIR/, so that we can provide feedback on the work? or do you prefer email instead? I think the gdoc option is better as the work is applicable to other repositories collaborating with the FAIRsFAIR team. @philippconzett regarding the resources required, my software architect @uschindler took a day or so to implement the previous patterns ;) if you need further input on technical implementation, please do not hesitate to get in touch with him.. |
@hvdsomp Sorry for that! I wasn't aware that the webpage wasn't public yet. Thanks for further input! @kitchenprinzessin3880 Thanks for this feedback. I guess then that this will be rather easy to implement once Dataverse has |
@kitchenprinzessin3880 I am OK to upload a version of the document into Google docs but will need to figure out how to do that as it's HTML with external CSS etc. Alternatively, we could just create a blank Google doc to provide feedback in, referring to section/paragraph in the doc? |
@hvdsomp yes external references can be problematic. how about i create an issue at https://github.com/pangaea-data-publisher/fuji and add my feedback there? i will share issue link to other project collaborators. what do you say? |
It's nice to see some chatter here. 😄 I'd feel remiss if I didn't mention that back in January it was a pleasure to meet @hvdsomp at PIDapalooza (when one could still travel!) where I also had a brief chat with Martin Klein, whom I asked about ResourceSync. In terms of time to implement, Martin said Signposting should be much easier since (from my understanding), you're just making some headers available. The devil is in the details, I'm sure, but there seems to be enough demand that we should probably try to get Signposting into Dataverse someday. 😄 |
Indeed, @pdurbin it was good to chat at PIDapalooza in the days that was still possible in person. I now have a significantly reworked draft available at http://signposting.org/FAIR/ . Feedback invited via https://github.com/hvdsomp/signposting . I hope the Dataverse community can work towards embracing this approach. |
I just noticed that Signposting support is set out as a desired characteristics in the COAR Community Framework for Good Practices in Repositories (https://doi.org/10.5281/zenodo.4110829); cf.:
|
Indeed. BTW, Implementation of the FAIR Signposting Profile (Level 1 and Level 2) for Dataverse is currently ongoing at DANS. |
Signposting is an approach to make the scholarly web more friendly to machines. It uses Typed Links as a means to clarify patterns that occur repeatedly in scholarly portals. For resources of any media type, these typed links are provided in HTTP Link headers.
At DANS we implemented Signposting in EASY on our dataset landing pages with patterns for 'bibliographic metadata' and 'identifiers'. See https://signposting.org/adopters/#dans for examples.
It would be nice to have Signposting implemented in Dataverse as well. We have to see which patterns are suitable. Let's discuss that in this issue.
The text was updated successfully, but these errors were encountered: