Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pivot to using Twig to get JSONLD instead of relying on the RDF module #1633

Open
dannylamb opened this issue Sep 30, 2020 · 14 comments
Open
Labels
Subject: Linked Data related to linked data. Consider also using metadata or modelling tags. Type: documentation provides documentation or asks for documentation. Type: enhancement Identifies work on an enhancement to the Islandora codebase

Comments

@dannylamb
Copy link
Contributor

We do a lot of work to get around the limitations of the rdf module. That plus no one seems to like editing the yml by hand.

Would using Twig templates to render out JSONLD be preferable here? The templates will have to live in a module, but would allow us some flexibility. Like being able to do things such bury metadata that's about the thing (and not the web content) under a # fragment. We could also preprocess the template to jam in the members and media views and would have that info available as well.

I'm curious how folks feel about that approach more than anything else.

@mjordan
Copy link
Contributor

mjordan commented Sep 30, 2020

@dannylamb this is a great idea.

@seth-shaw-unlv
Copy link
Contributor

I'm not sure...

Yes, the RDF module has serious limitations that need to be addressed, but I think the general approach of 1) defining a mapping in config, 2) building the JSON-LD array structure based on the mapping, and 3) stringifying it is a sound strategy.

The main limitations we have are 1) a limited mapping capability (forcing us to rely on alter hooks) and 2) no UI for updating the config (forcing us to use Features or the core config editing capabilities).

I don't know that using Twig templates gains us anything on those two fronts. First, while we have a lot of TWIG templates to build off of to create HTML pages, but I'm having a hard time imaging the twig templates necessary to generate JSON-LD. I'm sure you could do it... but I'd much rather take an array and call json_encode. Second, mapping logic will still have to happen in code, just in template preprocessing functions. This means we lose the existing mapping configuration we already have and will probably still need to end up using a bunch of template preprocess calls for customizing things that aren't much different from the existing alter we already have. This strikes me as more difficult to customize for most users. (I suppose you could put the mapping logic into the template, but that is going to get ugly, and will that be any easier for site admins than updating a config file?)

@seth-shaw-unlv
Copy link
Contributor

I should also note that our recent work on converters shows that we may have an effective way of overcoming the RDF mapping short-comings without alters that puts more of the power in the hands of config editors.

@DiegoPino
Copy link
Contributor

DiegoPino commented Sep 30, 2020

This is VERY important (and surprising) for us (Metro, Archipelago community @giancarlobi @alliomeria) and intersects our hard work (on talking speaking and teaching community about seeing metadata schemas as a flexible thing) and wonder if this means you are planning going the Archipelago way? (but not stating it as such) Twig template as the only source of Metadata casting, exposure is core to our architecture/system since 2018 and we have devoted a lot of engineering to make that happen, including endpoints, API exposure, extensions, caching, custom entities, etc. @seth-shaw-unlv has probably not seen this but yes, JSON-LD, IIIF, Schema.org, GEOJSON, MODS,DC etc are all generated that way in our case/code. And this goes way longer back from IMI...and even from times where Islandora 8 was silex microservices.

@mjordan @dannylamb if you plan to make such a big architectural change in Islandora 9 and overlap that much with our approach we would love that at least it is publicly acknowledged as such. Is that the case?

@alxp
Copy link
Contributor

alxp commented Sep 30, 2020

An alternate method I think ew can consider is to make use of the Metatag config interface, with its. plugin system for defining Groups and Tags.

An example, from my Schema.org Dataset module on Drupal:

You can define a field with just PHPDoc annotation, and inherit class methods only if you need to do something special with the output:

/**
 * Provides a plugin for the 'schema_dataset_contributor' meta tag.
 *
 * - 'id' should be a globally unique id.
 * - 'name' should match the Schema.org element name.
 * - 'group' should match the id of the group that defines the Schema.org type.
 *
 * @MetatagTag(
 *   id = "schema_dataset_contributor",
 *   label = @Translation("contributor"),
 *   description = @Translation("Contributor to the dataset"),
 *   name = "contributor",
 *   group = "schema_dataset",
 *   weight = 1,
 *   type = "string",
 *   secure = FALSE,
 *   multiple = TRUE
 * )
 */
class SchemaDatasetContributor extends SchemaPersonOrgBase {

}

While schema.org Metatag intervenes in the header generation to output the JSON-LD instead of head tags, we could intervene one time further to take some subset of JSON-LD to put it elsewhere.

In our RDM site we then export a config to specify which field to get the contributor name from, in our case it reaches in to a paragraph:

[node:field_rdm_contributors:entity:field_rdm_person:entity:field_rdm_personal_name]

The Metatag config UI can also have custom form elements

Screen Shot 2020-09-30 at 3 59 40 PM

@mjordan
Copy link
Contributor

mjordan commented Sep 30, 2020

@DiegoPino AFAIK this question is about serializing RDF only, we're not planning to move away from Drupal fields as the canonical home for data. Had a good discussion at today's tech call but we opened up more questions that we answered. Thanks for reaching out. We of course look forward to opportunities to collaborate with the Archipelago community. Not sure yet how/if this is one of them since at this point we're just questioning our current way of serializing field data into RDF.

@dannylamb
Copy link
Contributor Author

@alxp Is it possible to build a separate form than the one that the schema.org module provides using your technique?

@alxp
Copy link
Contributor

alxp commented Oct 1, 2020

@dannylamb The Metatag config form is generated by gathering up all plugins that declare they are a @MetatagGroup( in their plugin class annotation, and those classes extend MetatagGroup class. So if we wanted to have Islandora-specific Group(s), those could certainly live in their own page, and just get groups of a certain type or maybe we declare a separate group plugin type.

@dannylamb
Copy link
Contributor Author

I mean, if we can provide a way for someone to enable a module and now they have a form they can fill out to map fields to rdf with tokens, that certainly sounds like a nicer experience, albeit a bit more restrictive than what we have now.

Being able to let people have whatever fields they want but map them to a consistent dcterms representation would be pretty awesome.

@elizoller
Copy link
Member

Personally I don't think tokens would be flexible enough at all. There are often cases where we need to enforce additional logic to achieve the desired RDF mapping from the Drupal fields.

@dannylamb
Copy link
Contributor Author

I don't think @alxp 's approach can't have other logic involved. You can alter the form generation / submit process in all the plugins. And you could brick 'em together into more complicated structures. At least I think.

I know you're doing paragraphs and custom callbacks @elizoller, do you have an example of that? I'm curious.

I'm open to pretty much anything that would let us do something like nest the descriptive metadata about the thing in a # and keep modified/created date predicates (and other stuff about the web content itself) on the main URI. Or would let you build up JSONLD that keeps a whole named graph about the node and the media and the files. That type of thing.

@alxp
Copy link
Contributor

alxp commented Oct 5, 2020 via email

@elizoller
Copy link
Member

Here's what I've come up with. I'm sure its not perfect, but it's what we've got for now.
It has three parts: RDF Mapping, callbacks (referenced in the mapping), and the json_ld_alter hook

  1. RDF mapping: https://github.com/asulibraries/islandora-repo/blob/develop/config/sync/rdf.mapping.node.asu_repository_item.yml
  • There are quite a few special custom callbacks referenced there
  1. Special callbacks: https://github.com/asulibraries/islandora-repo/tree/develop/web/modules/custom/asu_custom_rdf/src
  • This is where the custom callback classes live.
  • Example 1:
uid:
    properties:
      - 'relators:dtc'
    mapping_type: property
    datatype_callback:
      callable: 'Drupal\asu_custom_rdf\UidLookup::username'

That one takes the user ID and turns it into the username instead. Because our business case was to use the asurite (an ASU username for single sign on) instead of the Drupal ID number of the user.

  • Example 2:
  field_note_para:
    properties:
      - 'mods:note'
      - 'dcterms:description'
    datatype_callback:
      callable: 'Drupal\asu_custom_rdf\ParagraphMapping::singlefield'
      arguments:
        - field_note_text

This might seem a little weird but basically what we have here is a paragraph field (called field_note_para) on the asu repository item. It has two fields in it one for type and one for the text of the note (field_note_text). In this case, for the RDF mapping, we decided not to separately represent the notes by type. So this one basically takes the argument for the field you want to represent and then the singleField method just gets the value from the field (and essentially ignores the rest of the paragraph).

  • Example 3:
  field_open_access:
    properties:
      - 'dcterms:accessRights'
    datatype_callback:
      callable: 'Drupal\asu_custom_rdf\ParseBoolean::tostring'
      arguments:
        1: 'open access'

Here we have a boolean field field_open_access and if the value is 1, we want to set the value of the dcterms:accessRights to the string 'open access'.

  • Example 4:
  status:
    properties:
      - 'asu:visibility'
    datatype_callback:
      callable: 'Drupal\asu_custom_rdf\ParseBoolean::tostring'
      arguments:
        - unpublished
        - published

Similar to above with the boolean field_open_access but we're mapping both 0 and 1 values to strings.

  • Example 5:
  field_title:
    properties:
      - 'dcterms:title'
      - 'mods:title'
    datatype_callback:
      callable: 'Drupal\asu_custom_rdf\ParagraphMapping::titlepartmerge'
      arguments:
        nonsort: field_nonsort
        main: field_main_title
        subtitle: field_subtitle

This one is kind of fun. We have a paragraph title field (field_title) that contains several subfields including field_nonsort, field_main_title, and field_subtitle. The custom callback gets each of the field values and merges them into a single string.

  • Example 6:
  field_typed_identifier:
    properties:
      - 'dcterms:identifier'
      - 'mods:identifier'
    datatype_callback:
      callable: 'Drupal\asu_custom_rdf\ParagraphMapping::typedmap'
      arguments:
        type_field: field_identifier_type
        type_taxonomy_field: field_identifier_predicate
        value_field: field_identifier_value
        predicate: identifiers

Another fun one. We have a paragraph field (field_typed_identifier). This is similar to the typed note field I mentioned above. But basically the paragraph contains a type field (field_identifier_type) and a value field (field_identifier_value). The type field is a taxonomy reference. This one has a simple value mapping from the value_field provided, but is significantly changed in the json_ld_alter_hook here: https://github.com/asulibraries/islandora-repo/blob/develop/web/modules/custom/asu_custom_rdf/asu_custom_rdf.module#L92 It is actually getting the taxonomy term from the type_field and then the predicate itself from the taxonomy term (field_identifier_predicate) to get the identifier type mapping. We're using id.loc.gov identifier types for this, with the goal of producing rdf like uri identifiers:doi doigoeshere. We went this way in attempt to follow what modsrdf recommends here: https://www.loc.gov/standards/mods/modsrdf/v1/#identifier

  1. json ld alter hook: https://github.com/asulibraries/islandora-repo/blob/develop/web/modules/custom/asu_custom_rdf/asu_custom_rdf.module
  • The primary purpose of that hook is to kind of clean up the RDF. For example, if you change a id field to a string, you have to remove the '@id'. Or if you change a boolean to a string, you have to add the @language information.

@seth-shaw-unlv
Copy link
Contributor

seth-shaw-unlv commented Oct 13, 2020

RDF YAML files put a lot of the logic of data transformation, like dates, into YAML entries. This approach is similar to the Migrate approach, where you are basically doing programming, but YAML is your DSL.

Not as much as you might think. The Migrate API does allow you to put a lot of logic in the YAML via temp fields and chained processing plugins, but the RDF mapping + conversion classes doesn't. You simply get the option to pass a field value to a single converter (plus give it some static options if supported by the converter). That is it. All the logic needs to live in those converters, just like the Schema/Metatag logic lives in the plugins. (Addendum: plus logic we've been tossing into the JSON-LD alters.) As for the user experience, I agree that forcing users to edit YAML isn't ideal, but we could build a Form interface for editing these YAML files with a drop-down of available converters.

That isn't to say I'm planting a flag in this strategy; I'm interested to see where this goes and open to options.

@kstapelfeldt kstapelfeldt added Type: documentation provides documentation or asks for documentation. Type: enhancement Identifies work on an enhancement to the Islandora codebase Subject: Linked Data related to linked data. Consider also using metadata or modelling tags. and removed architecture labels Sep 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Subject: Linked Data related to linked data. Consider also using metadata or modelling tags. Type: documentation provides documentation or asks for documentation. Type: enhancement Identifies work on an enhancement to the Islandora codebase
Projects
Development

No branches or pull requests

7 participants