Skip to content

Commit

Permalink
expose links to all export formats via Signposting #10542
Browse files Browse the repository at this point in the history
  • Loading branch information
pdurbin committed Nov 22, 2024
1 parent e32cfd8 commit 5a3291b
Show file tree
Hide file tree
Showing 7 changed files with 84 additions and 16 deletions.
9 changes: 9 additions & 0 deletions doc/release-notes/10542-signposting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Signposting Output Now Contains Links to All Dataset Metadata Export Formats

When Signposting was added in Dataverse 5.14 (#8981), it only provided links for the `schema.org` metadata export format.

The output of HEAD, GET, and the Signposting "linkset" API have all been updated to include links to all available dataset metadata export formats (including any external exporters, such as Croissant, that have been enabled).

This provides a lightweight machine-readable way to first retrieve a list of links (via a HTTP HEAD request, for example) to each available metadata export format and then follow up with a request for the export format of interest.

See also [the docs](https://preview.guides.gdcc.io/en/develop/api/native-api.html#retrieve-signposting-information) and #10542.
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/admin/discoverability.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ The Dataverse team has been working with Google on both formats. Google has `ind
Signposting
+++++++++++

The Dataverse software supports `Signposting <https://signposting.org>`_. This allows machines to request more information about a dataset through the `Link <https://tools.ietf.org/html/rfc5988>`_ HTTP header.
The Dataverse software supports `Signposting <https://signposting.org>`_. This allows machines to request more information about a dataset through the `Link <https://tools.ietf.org/html/rfc5988>`_ HTTP header. Links to all enabled metadata export formats are given. See :ref:`metadata-export-formats` for a list.

There are 2 Signposting profile levels, level 1 and level 2. In this implementation,
* Level 1 links are shown `as recommended <https://signposting.org/FAIR/>`_ in the "Link"
Expand Down
14 changes: 11 additions & 3 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1352,9 +1352,16 @@ The fully expanded example above (without environment variables) looks like this
curl "https://demo.dataverse.org/api/datasets/export?exporter=ddi&persistentId=doi:10.5072/FK2/J8SJZB"
.. note:: Supported exporters (export formats) are ``ddi``, ``oai_ddi``, ``dcterms``, ``oai_dc``, ``schema.org`` , ``OAI_ORE`` , ``Datacite``, ``oai_datacite`` and ``dataverse_json``. Descriptive names can be found under :ref:`metadata-export-formats` in the User Guide.
Available Dataset Metadata Exporters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Dataset metadata exporters that ship with Dataverse are ``ddi``, ``oai_ddi``, ``dcterms``, ``oai_dc``, ``schema.org`` , ``OAI_ORE`` , ``Datacite``, ``oai_datacite`` and ``dataverse_json``. These are the strings to pass as ``$METADATA_FORMAT`` in the examples above. Descriptive names for each format can be found under :ref:`metadata-export-formats` in the User Guide.

Additional exporters can be enabled, as described under :ref:`external-exporters` in the Installation Guide. They are listed under :ref:`inventory-of-external-exporters`.

To discover the machine-readable name of exporters (e.g. ``ddi``) that have been enabled on the installation of Dataverse you are using, you can use the Signposting "linkset" API documented under :ref:`signposting-api`.

.. note:: Additional exporters can be enabled, as described under :ref:`external-exporters` in the Installation Guide. To discover the machine-readable name of each exporter (e.g. ``ddi``), check :ref:`inventory-of-external-exporters` or ``getFormatName`` in the exporter's source code.
To discover the machine-readable name of exporters generally, check :ref:`inventory-of-external-exporters` or ``getFormatName`` in the exporter's source code.

Schema.org JSON-LD
^^^^^^^^^^^^^^^^^^
Expand All @@ -1368,6 +1375,7 @@ Both forms are valid according to Google's Structured Data Testing Tool at https

The standard has further evolved into a format called Croissant. For details, see :ref:`schema.org-head` in the Admin Guide.

The ``schema.org`` format changed after Dataverse 6.4 as well. Previously its content type was "application/json" but now it is "application/ld+json".
List Files in a Dataset
~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -2936,7 +2944,7 @@ Signposting involves the addition of a `Link <https://tools.ietf.org/html/rfc598

Here is an example of a "Link" header:

``Link: <https://doi.org/10.5072/FK2/YD5QDG>;rel="cite-as", <https://doi.org/10.5072/FK2/YD5QDG>;rel="describedby";type="application/vnd.citationstyles.csl+json",<https://demo.dataverse.org/api/datasets/export?exporter=schema.org&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/ld+json", <https://schema.org/AboutPage>;rel="type",<https://schema.org/Dataset>;rel="type", <https://demo.dataverse.org/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.5072/FK2/YD5QDG>;rel="license", <https://demo.dataverse.org/api/datasets/:persistentId/versions/1.0/linkset?persistentId=doi:10.5072/FK2/YD5QDG> ; rel="linkset";type="application/linkset+json"``
``Link: <https://doi.org/10.5072/FK2/YD5QDG>;rel="cite-as", <https://doi.org/10.5072/FK2/YD5QDG>;rel="describedby";type="application/vnd.citationstyles.csl+json",<https://demo.dataverse.org/api/datasets/export?exporter=OAI_ORE&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/json",<https://demo.dataverse.org/api/datasets/export?exporter=Datacite&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml",<https://demo.dataverse.org/api/datasets/export?exporter=oai_dc&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml",<https://demo.dataverse.org/api/datasets/export?exporter=oai_datacite&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml",<https://demo.dataverse.org/api/datasets/export?exporter=schema.org&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/ld+json",<https://demo.dataverse.org/api/datasets/export?exporter=ddi&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml",<https://demo.dataverse.org/api/datasets/export?exporter=dcterms&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml",<https://demo.dataverse.org/api/datasets/export?exporter=html&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="text/html",<https://demo.dataverse.org/api/datasets/export?exporter=dataverse_json&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/json",<https://demo.dataverse.org/api/datasets/export?exporter=oai_ddi&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/xml", <https://schema.org/AboutPage>;rel="type",<https://schema.org/Dataset>;rel="type", <http://creativecommons.org/publicdomain/zero/1.0>;rel="license", <https://demo.dataverse.org/api/datasets/:persistentId/versions/1.0/linkset?persistentId=doi:10.5072/FK2/YD5QDG> ; rel="linkset";type="application/linkset+json"``

The URL for linkset information is discoverable under the ``rel="linkset";type="application/linkset+json`` entry in the "Link" header, such as in the example above.

Expand Down
2 changes: 2 additions & 0 deletions doc/sphinx-guides/source/user/dataset-management.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ Additional formats can be enabled. See :ref:`inventory-of-external-exporters` in

Each of these metadata exports contains the metadata of the most recently published version of the dataset.

For each dataset, links to each enabled metadata format are available programmatically via Signposting. For details, see :ref:`discovery-sign-posting` in the Admin Guide and :ref:`signposting-api` in the API Guide.

.. _adding-new-dataset:

Adding a New Dataset
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,11 @@ public Boolean isAvailableToUsers() {

@Override
public String getMediaType() {
return MediaType.APPLICATION_JSON;
/**
* Changed from "application/json" to "application/ld+json" because
* that's what Signposting expects.
*/
return "application/ld+json";
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Two configurable options allow changing the limit for the number of authors or d

import edu.harvard.iq.dataverse.*;
import edu.harvard.iq.dataverse.dataset.DatasetUtil;
import edu.harvard.iq.dataverse.export.ExportService;
import jakarta.json.Json;
import jakarta.json.JsonArrayBuilder;
import jakarta.json.JsonObjectBuilder;
Expand All @@ -28,6 +29,8 @@ Two configurable options allow changing the limit for the number of authors or d
import java.util.logging.Logger;

import static edu.harvard.iq.dataverse.util.json.NullSafeJsonBuilder.jsonObjectBuilder;
import io.gdcc.spi.export.ExportException;
import io.gdcc.spi.export.Exporter;

public class SignpostingResources {
private static final Logger logger = Logger.getLogger(SignpostingResources.class.getCanonicalName());
Expand Down Expand Up @@ -72,8 +75,18 @@ public String getLinks() {
}

String describedby = "<" + ds.getGlobalId().asURL().toString() + ">;rel=\"describedby\"" + ";type=\"" + "application/vnd.citationstyles.csl+json\"";
describedby += ",<" + systemConfig.getDataverseSiteUrl() + "/api/datasets/export?exporter=schema.org&persistentId="
+ ds.getProtocol() + ":" + ds.getAuthority() + "/" + ds.getIdentifier() + ">;rel=\"describedby\"" + ";type=\"application/ld+json\"";
ExportService instance = ExportService.getInstance();
for (String[] labels : instance.getExportersLabels()) {
String formatName = labels[1];
Exporter exporter;
try {
exporter = ExportService.getInstance().getExporter(formatName);
describedby += ",<" + systemConfig.getDataverseSiteUrl() + "/api/datasets/export?exporter=" + formatName + "&persistentId="
+ ds.getProtocol() + ":" + ds.getAuthority() + "/" + ds.getIdentifier() + ">;rel=\"describedby\"" + ";type=\"" + exporter.getMediaType() + "\"";
} catch (ExportException ex) {
logger.warning("Could not look up exporter based on " + formatName + ". Exception: " + ex);
}
}
valueList.add(describedby);

String type = "<https://schema.org/AboutPage>;rel=\"type\"";
Expand Down Expand Up @@ -112,15 +125,25 @@ public JsonArrayBuilder getJsonLinkset() {
)
);

mediaTypes.add(
jsonObjectBuilder().add(
"href",
systemConfig.getDataverseSiteUrl() + "/api/datasets/export?exporter=schema.org&persistentId=" + ds.getProtocol() + ":" + ds.getAuthority() + "/" + ds.getIdentifier()
).add(
"type",
"application/ld+json"
)
);
ExportService instance = ExportService.getInstance();
for (String[] labels : instance.getExportersLabels()) {
String formatName = labels[1];
Exporter exporter;
try {
exporter = ExportService.getInstance().getExporter(formatName);
mediaTypes.add(
jsonObjectBuilder().add(
"href",
systemConfig.getDataverseSiteUrl() + "/api/datasets/export?exporter=" + formatName + "&persistentId=" + ds.getProtocol() + ":" + ds.getAuthority() + "/" + ds.getIdentifier()
).add(
"type",
exporter.getMediaType()
)
);
} catch (ExportException ex) {
logger.warning("Could not look up exporter based on " + formatName + ". Exception: " + ex);
}
}
JsonArrayBuilder linksetJsonObj = Json.createArrayBuilder();

JsonObjectBuilder mandatory;
Expand Down
22 changes: 22 additions & 0 deletions src/test/java/edu/harvard/iq/dataverse/api/SignpostingIT.java
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,16 @@ public void testSignposting() {
Response getHtml = given().get(datasetLandingPage);

System.out.println("Link header: " + getHtml.getHeader("Link"));
if (false) {
// Split on commas to make the output more readable.
System.out.println("---");
String header = getHtml.getHeader("Link");
for (String string : header.split(",")) {
System.out.println(string + ",");
}
System.out.println("returning early...");
return;
}

getHtml.then().assertThat().statusCode(OK.getStatusCode());

Expand All @@ -67,6 +77,8 @@ public void testSignposting() {
assertTrue(linkHeader.contains(datasetPid));
assertTrue(linkHeader.contains("cite-as"));
assertTrue(linkHeader.contains("describedby"));
// Make sure we get more exporters besides just "schema.org".
assertTrue(linkHeader.contains("oai_datacite"));

Response headHtml = given().head(datasetLandingPage);

Expand All @@ -76,6 +88,7 @@ public void testSignposting() {

// Make sure there's Signposting stuff in the "Link" header such as
// the dataset PID, cite-as, etc.
// TODO: The comment above is a repeat and so are some of the assertions below. Consolidate?
linkHeader = getHtml.getHeader("Link");
assertTrue(linkHeader.contains(datasetPid));
assertTrue(linkHeader.contains("cite-as"));
Expand All @@ -90,8 +103,10 @@ public void testSignposting() {
System.out.println("Linkset URL: " + linksetUrl);

Response linksetResponse = given().accept(ContentType.JSON).get(linksetUrl);
linksetResponse.prettyPrint();

String responseString = linksetResponse.getBody().asString();
System.out.println("response string: " + responseString);

JsonObject data = JsonUtil.getJsonObject(responseString);
JsonObject lso = data.getJsonArray("linkset").getJsonObject(0);
Expand All @@ -107,6 +122,13 @@ public void testSignposting() {
Pattern exporterPattern = Pattern.compile("[<\\[][^()\\[\\]]*?exporter=schema.org[^()\\[\\]]*[>\\]]");
Matcher exporterMatcher = exporterPattern.matcher(linkHeader);
exporterMatcher.find();
// TODO: make an assertion
//assertTrue(exporterMatcher.find());

// Test another
Pattern exporterPattern2 = Pattern.compile("exporter=oai_datacite");
Matcher exporterMatcher2 = exporterPattern2.matcher(linkHeader);
assertTrue(exporterMatcher2.find());

Response exportDataset = UtilIT.exportDataset(datasetPid, "schema.org");
exportDataset.prettyPrint();
Expand Down

0 comments on commit 5a3291b

Please sign in to comment.