The Metadabus was created to automate management of data that was being loaded into Solr for the Canadiana Access Platform (CAP). The metadata is based on a schema called the Canadiana Metadata Repository (CMR) record.
The metadata bus is a series of data processing scripts and tools which allow metadata to flow between stages from when an artifact is first acquired by Canadiana all the way to when it is viewable on the platform.
The output of the Metadata Bus processes are derivatives of the source data collected during the preservation and archive processes, which are formatted in such a way that allow for easy public consumption.
Changes or new additions to the source data are queued, processed and updated across public platforms.
The Metadata Bus includes the following services:
Reads METS records from the repository and generates canvas and manifest records.
Manifest records have a noid for an _id and a slug which is set by Smelter
Canvas records have a noid for an _id, and are not tied to any specific manifest.
Handles the processing of individual manifests
Reads a _view in the manifest and collection documents to read data from those documents, and potentially from XML descriptive metadata files (in swift) and updates the cosearch and copresentation databases.
Streams updates that occur in the search database to individual Solr cores. Solr is an enterprise search engine platform.
Keeps the dipstaging and wipmeta database up to date with the public availability of replicas of AIP content in repositories.
todo
todo
The above services interact with the following 'Access Databases':
Derived Data
Ids are AIP IDs
Used by Smelter and reposync
Process data from the repository to create manifests dents
Documents are created by reposync on data in the repository
Source Data
Ids are noids
Used to store information about individual images
Analogous to sequences within internalmeta records
Source Data
Ids are noids
Used to store information about groups of canvases
Analogous to internalmeta records
Source Data
Ids are noids
Used to store info about groups of manifests and/or other collections.
Combines both the concepts of series records and the collection tags in internalmeta
An ordered collection references it's child manifests
Before, an issue pointed to a parent series
Derived Data
Ids are noids or slug
Analogous to cosearch database
Streamed to Solr
Derived Data
Ids are noids or slug
Analogous to copresentation database
Read by CAP (Canadiana Access Platform)
todo
todo
Other key documents:
- Building, testing and deploying software
- History of software stack
- CMR and Crosswalks to CMR
- Descriptive Metadata tasks (microservice for "Load Metadata")
- Descriptive Metadata Tools
Other relevant repositories:
- CAP front-end
- Solr configuration
- CMR version 1.2 XML Schema Definition (XSD)