Skip to content

Commit

Permalink
Merge pull request #8981 from GlobalDataverseCommunityConsortium/GDCC…
Browse files Browse the repository at this point in the history
…/Signposting

Gdcc/Signposting
  • Loading branch information
kcondon authored Mar 24, 2023
2 parents 51ff682 + 13dc1fa commit 89c5add
Show file tree
Hide file tree
Showing 15 changed files with 592 additions and 11 deletions.
8 changes: 8 additions & 0 deletions doc/release-notes/8424-signposting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Signposting for Dataverse

This release adds [Signposting](https://signposting.org/) support to Dataverse to improve machine discoverability of datasets and files.

The following MicroProfile Config options are now available (these can be treated as JVM options):

- dataverse.signposting.level1-author-limit
- dataverse.signposting.level1-item-limit
76 changes: 76 additions & 0 deletions doc/sphinx-guides/source/admin/discoverability.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
Discoverability
===============

Datasets are made discoverable by a variety of methods.

.. contents:: |toctitle|
:local:

DataCite Integration
--------------------

If you are using `DataCite <https://datacite.org>`_ as your DOI provider, when datasets are published, metadata is pushed to DataCite, where it can be searched. For more information, see :ref:`:DoiProvider` in the Installation Guide.

OAI-PMH (Harvesting)
--------------------

The Dataverse software supports a protocol called OAI-PMH that facilitates harvesting dataset metadata from one system into another. For details on harvesting, see the :doc:`harvestserver` section.

Machine-Readable Metadata on Dataset Landing Pages
--------------------------------------------------

As recommended in `A Data Citation Roadmap for Scholarly Data Repositories <https://doi.org/10.1101/097196>`_, the Dataverse software embeds metadata on dataset landing pages in a variety of machine-readable ways.

Dublin Core HTML Meta Tags
++++++++++++++++++++++++++

The HTML source of a dataset landing page includes "DC" (Dublin Core) ``<meta>`` tags such as the following::

<meta name="DC.identifier" content="..."
<meta name="DC.type" content="Dataset"
<meta name="DC.title" content="..."

Schema.org JSON-LD Metadata
+++++++++++++++++++++++++++

The HTML source of a dataset landing page includes Schema.org JSON-LD metadata like this::


<script type="application/ld+json">{"@context":"http://schema.org","@type":"Dataset","@id":"https://doi.org/...


.. _discovery-sign-posting:

Signposting
+++++++++++

The Dataverse software supports `Signposting <https://signposting.org>`_. This allows machines to request more information about a dataset through the `Link <https://tools.ietf.org/html/rfc5988>`_ HTTP header.

There are 2 Signposting profile levels, level 1 and level 2. In this implementation,
* Level 1 links are shown `as recommended <https://signposting.org/FAIR/>`_ in the "Link"
HTTP header, which can be fetched by sending an HTTP HEAD request, e.g. ``curl -I https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.5072/FK2/KPY4ZC``.
The number of author and file links in the level 1 header can be configured as described below.
* The level 2 linkset can be fetched by visiting the dedicated linkset page for
that artifact. The link can be seen in level 1 links with key name ``rel="linkset"``.

Note: Authors without author link will not be counted nor shown in any profile/linkset.
The following configuration options are available:

- :ref:`dataverse.signposting.level1-author-limit`

Sets the max number of authors to be shown in `level 1` profile.
If the number of authors (with identifier URLs) exceeds this value, no author links will be shown in `level 1` profile.
The default is 5.

- :ref:`dataverse.signposting.level1-item-limit`

Sets the max number of items/files which will be shown in `level 1` profile. Datasets with
too many files will not show any file links in `level 1` profile. They will be shown in `level 2` linkset only.
The default is 5.

See also :ref:`signposting-api` in the API Guide.

Additional Discoverability Through Integrations
-----------------------------------------------

See :ref:`integrations-discovery` in the Integrations section for additional discovery methods you can enable.
1 change: 1 addition & 0 deletions doc/sphinx-guides/source/admin/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ This guide documents the functionality only available to superusers (such as "da

dashboard
external-tools
discoverability
harvestclients
harvestserver
metadatacustomization
Expand Down
9 changes: 3 additions & 6 deletions doc/sphinx-guides/source/admin/integrations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -179,15 +179,12 @@ Avgidea Data Search

Researchers can use a Google Sheets add-on to search for Dataverse installation's CSV data and then import that data into a sheet. See `Avgidea Data Search <https://www.avgidea.io/avgidea-data-platform.html>`_ for details.

.. _integrations-discovery:

Discoverability
---------------

Integration with `DataCite <https://datacite.org>`_ is built in to the Dataverse Software. When datasets are published, metadata is sent to DataCite. You can further increase the discoverability of your datasets by setting up additional integrations.

OAI-PMH (Harvesting)
++++++++++++++++++++

The Dataverse Software supports a protocol called OAI-PMH that facilitates harvesting datasets from one system into another. For details on harvesting, see the :doc:`harvestserver` section.
A number of builtin features related to data discovery are listed under :doc:`discoverability` but you can further increase the discoverability of your data by setting up integrations.

SHARE
+++++
Expand Down
26 changes: 25 additions & 1 deletion doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2084,10 +2084,34 @@ The response is a JSON object described in the :doc:`/api/external-tools` sectio
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
export VERSION=1.0
export TOOL_ID=1
curl -H "X-Dataverse-key: $API_TOKEN" -H "Accept:application/json" "$SERVER_URL/api/datasets/:persistentId/versions/$VERSION/toolparams/$TOOL_ID?persistentId=$PERSISTENT_IDENTIFIER"
.. _signposting-api:

Retrieve Signposting Information
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Dataverse supports :ref:`discovery-sign-posting` as a discovery mechanism.
Signposting involves the addition of a `Link <https://tools.ietf.org/html/rfc5988>`__ HTTP header providing summary information on GET and HEAD requests to retrieve the dataset page and a separate /linkset API call to retrieve additional information.

Here is an example of a "Link" header:

``Link: <https://doi.org/10.5072/FK2/YD5QDG>;rel="cite-as", <https://doi.org/10.5072/FK2/YD5QDG>;rel="describedby";type="application/vnd.citationstyles.csl+json",<https://demo.dataverse.org/api/datasets/export?exporter=schema.org&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/json+ld", <https://schema.org/AboutPage>;rel="type",<https://schema.org/Dataset>;rel="type", https://demo.dataverse.org/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.5072/FK2/YD5QDG;rel="license", <https://demo.dataverse.org/api/datasets/:persistentId/versions/1.0/linkset?persistentId=doi:10.5072/FK2/YD5QDG> ; rel="linkset";type="application/linkset+json"``

The URL for linkset information is discoverable under the ``rel="linkset";type="application/linkset+json`` entry in the "Link" header, such as in the example above.

The reponse includes a JSON object conforming to the `Signposting <https://signposting.org>`__ specification.
Signposting is not supported for draft dataset versions.

.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/YD5QDG
export VERSION=1.0
curl -H "Accept:application/json" "$SERVER_URL/api/datasets/:persistentId/versions/$VERSION/linkset?persistentId=$PERSISTENT_IDENTIFIER"
Files
-----

Expand Down
1 change: 1 addition & 0 deletions doc/sphinx-guides/source/developers/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ sub-scopes first.
- All sub-scopes are below that.
- Scopes are separated by dots (periods).
- A scope may be a placeholder, filled with a variable during lookup. (Named object mapping.)
- The setting should be in kebab case (``signing-secret``) rather than camel case (``signingSecret``).

Any consumer of the setting can choose to use one of the fluent ``lookup()`` methods, which hides away alias handling,
conversion etc from consuming code. See also the detailed Javadoc for these methods.
Expand Down
20 changes: 20 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2153,6 +2153,26 @@ See also these related database settings:
- :ref:`:Authority`
- :ref:`:Shoulder`


.. _dataverse.signposting.level1-author-limit:

dataverse.signposting.level1-author-limit
+++++++++++++++++++++++++++++++++++++++++

See :ref:`discovery-sign-posting` for details.

Can also be set via any `supported MicroProfile Config API source`_, e.g. the environment variable ``DATAVERSE_SIGNPOSTING_LEVEL1_AUTHOR_LIMIT``.

.. _dataverse.signposting.level1-item-limit:

dataverse.signposting.level1-item-limit
+++++++++++++++++++++++++++++++++++++++

See :ref:`discovery-sign-posting` for details.

Can also be set via any `supported MicroProfile Config API source`_, e.g. the environment variable ``DATAVERSE_SIGNPOSTING_LEVEL1_ITEM_LIMIT``.


.. _feature-flags:

Feature Flags
Expand Down
19 changes: 17 additions & 2 deletions src/main/java/edu/harvard/iq/dataverse/DatasetPage.java
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,8 @@
import edu.harvard.iq.dataverse.search.SearchServiceBean;
import edu.harvard.iq.dataverse.search.SearchUtil;
import edu.harvard.iq.dataverse.search.SolrClientService;
import edu.harvard.iq.dataverse.settings.JvmSettings;
import edu.harvard.iq.dataverse.util.SignpostingResources;
import edu.harvard.iq.dataverse.util.FileMetadataUtil;
import java.util.Comparator;
import org.apache.solr.client.solrj.SolrQuery;
Expand Down Expand Up @@ -6046,8 +6048,7 @@ public boolean downloadingRestrictedFiles() {
}
return false;
}



//Determines whether this Dataset uses a public store and therefore doesn't support embargoed or restricted files
public boolean isHasPublicStore() {
return settingsWrapper.isTrueForKey(SettingsServiceBean.Key.PublicInstall, StorageIO.isPublicStore(dataset.getEffectiveStorageDriverId()));
Expand Down Expand Up @@ -6080,5 +6081,19 @@ public String getWebloaderUrlForDataset(Dataset d) {
return null;
}
}

/**
* Add Signposting
* @return String
*/
public String getSignpostingLinkHeader() {
if (!workingVersion.isReleased()) {
return null;
}
SignpostingResources sr = new SignpostingResources(systemConfig, workingVersion,
JvmSettings.SIGNPOSTING_LEVEL1_AUTHOR_LIMIT.lookupOptional().orElse(""),
JvmSettings.SIGNPOSTING_LEVEL1_ITEM_LIMIT.lookupOptional().orElse(""));
return sr.getLinks();
}

}
37 changes: 36 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/api/Datasets.java
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@
import edu.harvard.iq.dataverse.metrics.MetricsUtil;
import edu.harvard.iq.dataverse.makedatacount.MakeDataCountUtil;
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
import edu.harvard.iq.dataverse.settings.SettingsServiceBean.Key;
import edu.harvard.iq.dataverse.util.ArchiverUtil;
import edu.harvard.iq.dataverse.util.BundleUtil;
import edu.harvard.iq.dataverse.util.EjbUtil;
Expand All @@ -93,6 +94,8 @@
import edu.harvard.iq.dataverse.util.json.JSONLDUtil;
import edu.harvard.iq.dataverse.util.json.JsonLDTerm;
import edu.harvard.iq.dataverse.util.json.JsonParseException;
import edu.harvard.iq.dataverse.util.json.JsonPrinter;
import edu.harvard.iq.dataverse.util.SignpostingResources;
import edu.harvard.iq.dataverse.util.json.JsonUtil;
import edu.harvard.iq.dataverse.search.IndexServiceBean;

Expand Down Expand Up @@ -156,6 +159,7 @@
import org.glassfish.jersey.media.multipart.FormDataContentDisposition;
import org.glassfish.jersey.media.multipart.FormDataParam;
import com.amazonaws.services.s3.model.PartETag;
import edu.harvard.iq.dataverse.settings.JvmSettings;

@Path("datasets")
public class Datasets extends AbstractApiBean {
Expand Down Expand Up @@ -558,7 +562,38 @@ public Response getVersionMetadataBlock(@Context ContainerRequestContext crc,
return notFound("metadata block named " + blockName + " not found");
}, getRequestUser(crc));
}


/**
* Add Signposting
* @param datasetId
* @param versionId
* @param uriInfo
* @param headers
* @return
*/
@GET
@AuthRequired
@Path("{id}/versions/{versionId}/linkset")
public Response getLinkset(@Context ContainerRequestContext crc, @PathParam("id") String datasetId, @PathParam("versionId") String versionId, @Context UriInfo uriInfo, @Context HttpHeaders headers) {
if ( ":draft".equals(versionId) ) {
return badRequest("Signposting is not supported on the :draft version");
}
User user = getRequestUser(crc);
return response(req -> {
DatasetVersion dsv = getDatasetVersionOrDie(req, versionId, findDatasetOrDie(datasetId), uriInfo, headers);
return ok(Json.createObjectBuilder().add(
"linkset",
new SignpostingResources(
systemConfig,
dsv,
JvmSettings.SIGNPOSTING_LEVEL1_AUTHOR_LIMIT.lookupOptional().orElse(""),
JvmSettings.SIGNPOSTING_LEVEL1_ITEM_LIMIT.lookupOptional().orElse("")
).getJsonLinkset()
)
);
}, user);
}

@GET
@AuthRequired
@Path("{id}/modifyRegistration")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,11 @@ public enum JvmSettings {
// API SETTINGS
SCOPE_API(PREFIX, "api"),
API_SIGNING_SECRET(SCOPE_API, "signing-secret"),

// SIGNPOSTING SETTINGS
SCOPE_SIGNPOSTING(PREFIX, "signposting"),
SIGNPOSTING_LEVEL1_AUTHOR_LIMIT(SCOPE_SIGNPOSTING, "level1-author-limit"),
SIGNPOSTING_LEVEL1_ITEM_LIMIT(SCOPE_SIGNPOSTING, "level1-item-limit"),

// FEATURE FLAGS SETTINGS
SCOPE_FLAGS(PREFIX, "feature"),
Expand Down
Loading

0 comments on commit 89c5add

Please sign in to comment.