-
Notifications
You must be signed in to change notification settings - Fork 4
MetadataLayer
Every API has a different way of representing bibliographic metadata (facts about a book, like its title) and licensing data (facts about access to a book, like the number of patrons in its holds queue). The metadata layer abstracts away the differences between APIs, and hides the complexity of the underlying data model.
When you learn something about a title in the collection, you need to
find out as much bibliographic information about that title as
possible, and create an Identifier
, Edition
, LicensePool
, and
Work
for it. The simplest way to do this is through the metadata
layer.
The metadata layer is defined in core/metadata_layer.py
.
A CirculationData
object contains information about a license for a
book. This includes how many copies of the title are available and in
which formats. You can provide all the necessary items in the
constructor:
-
data_source
: The name of the data source that provides the license. For your integration this will always be the constant you added to theDataSource
class. -
primary_identifier
: AnIdentifierData
object that lets the circulation manager know how it should refer to the title when talking to the license provider. This is usually some sort of proprietary ID, but sometimes it's ISBN. -
licenses_owned
: The number of licenses owned by the library that licensed this collection. If this collection doesn't track single-use licenses, it's okay to set this to 1 or 0, depending on whether the title is currently in the collection. -
licenses_available
: The number of licenses that can be loaned out right now. If this collection doesn't track single-use licenses, it's okay to treat this as a boolean: set it to 1 or 0, depending on whether the title can be loaned out right now. -
patrons_in_hold_queue
: The number of patrons who are waiting in line to borrow this book. If this collection doesn't track single-use licenses, this will probably always be 0. -
licenses_reserved
: The number of licenses that are in the following (somewhat specialized) state: someone put the book on hold, waited for it, and is now at the front of the queue. They can borrow the book right now, but it's not available to borrow in general.If you don't know how many titles are in this state, it's okay to set this to 0.
-
formats
: A list ofFormatData
objects that explain what formats have been licensed. There must be at least oneFormatData
object, or there will be no way to deliver the title to a patron.
There are two other constructor arguments designed for use with open-access collections:
-
default_rights_uri
: The terms under which this title is made available to the general public (not to the library that licensed this collection) This should be one of the constants defined incore/model.py:RightsStatus
. The default isIN_COPYRIGHT
, which is appropriate for any title that is not open-access. -
links
: This list may contain one or moreLinkData
objects which link to actual copies of the book.
A Metadata
object contains bibliographic information about an item.
-
data_source
: The name of the data source that provides the metadata. For your integration this will always be the constant you added to theDataSource
class. -
title
: The title of the item. -
subtitle
: The item's subtitle, if any. -
sort_title
: The item's sort title, if different from thetitle
. -
language
: The primary language of the item. This should be a three-character ISO 639-2 code like "eng". -
medium
: The medium of the item. This is one of the_MEDIUM
constants fromcore/model.py:Edition
; usuallyBOOK_MEDIUM
orAUDIO_MEDIUM
. -
series
: If this item belongs to a series of items, the name of the series. -
series_position
: The item's numeric position within its series, if it has one. -
publisher
: The publisher that issued this electronic edition. -
imprint
: The imprint ofpublisher
, if any, that issued this electronic edition. -
issued
: A `datetime object indicating when this electronic edition of the work was first made available. This date should be within the past few years. -
published
: Adatetime
object indicating when this work was originally published. This date might be hundreds of years in the past. -
primary_identifier
: AnIdentifierData
object that lets the circulation manager know how it should refer to the title when talking to the data provider. This is usually some sort of proprietary ID, but sometimes it's ISBN. -
identifiers
: A list ofIdentifierData
objects representing other ways the data provider has of referring to this title. There really ought to be an ISBN in here. -
subjects
: A list ofSubjectData
objects representing the subjects under which the data provider thinks this title should be filed. -
contributors
: A list ofContributorData
objects representing the authors and other people who contributed to this title. -
measurements
: A list ofMeasurementData
objects representing measurements the data provider has taken of this title. -
links
: A list ofLinkData
objects representing related resources such as cover images. -
recommendations
: A list ofIdentifierData
objects representing other titles that the data provider recommends for people who liked this title. -
circulation
: An optionalCirculationData
. If your API provides metadata and circulation information at the same time, you can create aCirculationData
object and carry it around inside aMetadata
object.
Note that there is no special place to put a description of the book, as you might find on the back cover. A book's description is contained in a LinkData
object (see below) with rel="description"
, which goes into links
.
The Metadata
class acts as a container for lots of other objects. Like Metadata
, these objects are convenient stand-ins for data model objects. Unlike Metadata
, you don't need to call apply()
on these objects to write them to the database. You just associate them with a Metadata
and they're written to the database with everything else when you call Metadata.apply()
.
The most important of these objects is ContributorData
, which represents a person or person-like entity that contributed to the production of a title. It can contain the following information:
-
display_name
: The person's name as it would show up on the front of a book jacket, e.g. "Stephen King". -
sort_name
: The person's name as it would show up in an alphabetized card catalog, e.g. "King, Stephen". -
family_name
: The person's family name, as it might show up on the side of a book jacket, e.g. "King". -
roles
: A list of roles this person had in the creation of this book. TheContributor
class in core/model.py lists about thirty common roles ranging fromEDITOR_ROLE
toLYRICIST_ROLE
. The ones you'll use most often arePRIMARY_AUTHOR_ROLE
andAUTHOR_ROLE
. -
wikipedia_name
: The name of the person's Wikipedia page, e.g. "Stephen_King" -
lc
: The person's LCCN, e.g. "n79063767" -
viaf
: The person's VIAF number, e.g. "97113511" -
biography
: The person's biography, as you might see on an inside book jacket. -
aliases
: A list of alternate names for this person, e.g.["Richard Bachman"]
Of these, the only ones that are really important are display_name
, sort_name
, and roles
. If we have display_name
we can try to automatically figure out sort_name
, and vice versa, but we don't always get it right.
This class represents a decision to classify a book under one subject or another.
-
type
: The classification scheme in use. TheSubject
class in core/model.py defines a number of industry standard types such asDDC
andBISAC
, as well as
types that are specific to third party data providers likeOVERDRIVE
. The catch-allTAG
is designed to hold unstructured tags.If your data source defines its own classification scheme, you can add a constant for it in
Subject
. You'll also need to add aClassifier
subclass to core/classifier.py to boil down the classifications into the genres defined by Library Simplified. -
identifier
: The identifying string for the subject in use, e.g. "032" for Dewey Decimal Classification 032. -
name
: The human-readable name associated with the subject, if different from the identifier. For example, theidentifier
of Dewey Decimal Classification 032 is "032" and thename
is "Encyclopedias in English". -
weight
: A number indicating how strongly you believe this book should be classified under this subject.There are no hard and fast rules here, but in general this number should range from 1 to 100, with 100 meaning that you trust this data a lot, and 1 meaning that it's just a suggestion. This number can have different values for different classifications from the same data source. We weight Overdrive classifications very highly, but we give very little weight to the freeform tags provided by Overdrive.
This class represents an identifying string used to distinguish this book from other books.
-
type
: TheIdentifier
class in core/model.py defines a number of standard types such asISBN
andURI
, as well as
types that are specific to third parties likeOVERDRIVE_ID
andASIN
. You can add an identifier type by adding a constant to this class. -
identifier
: The identifying string itself, e.g. "9781453219539" or "019f21e3-9de9-4c40-95a4-dfabf55e7801" or "https://standardebooks.org/ebooks/miguel-de-cervantes-saavedra/don-quixote/john-ormsby/". -
weight
: On a scale of 0 to 1, how certain are you that this string identifies the book whoseMetadata
you're putting it into?If this is true by definition, 1 is an appropriate value. If Overdrive says that the Overdrive ID of a book is "019f21e3-9de9-4c40-95a4-dfabf55e7801", then it's "019f21e3-9de9-4c40-95a4-dfabf55e7801", by definition. Otherwise it's a judgment call, depending on how much you trust the data. If you guessed at an identifier based on a title/author match, then this number won't be terribly high.
Most commercial data sources include a proprietary ID as well as an ISBN for the same book. In this case it's appropriate to assign 1 to both identifiers, but the proprietary ID should be used as the primary identifier.
For a book's
primary_identifier
, this value must be 1.
This class represents a format in which the book is available. We consider 'format' to cover both the type of the file and the DRM hoops one must jump through to get the file.
-
content_type
: The media type of the file or files that contain the book. TheRepresentation
class in core/model.py defines a number of constants for common media types likeEPUB_MEDIA_TYPE
. -
drm_scheme
: The DRM scheme used to encrypt or obscure access to the book. TheDeliveryMechanism
class in core/model.py defines a number of constants for common DRM schemes likeADOBE_DRM
. -
link
: ALinkData
object representing an actual freely-available copy of the book in this format. If you put aLinkData
in here, there's no need to also include that object inMetadata.links
. -
rights_uri
: The license under which this book is offered to the general public. TheRightsStatus
class in core/model.py defines constants for the various Creative Commons licenses and a few others. The default isRightsStatus.IN_COPYRIGHT
, indicating a book that is not available to the general public except through special arrangement.
Note that adding a new content_type
or drm_scheme
will not make the system support the underlying content type or DRM scheme. It merely allows the system to record the fact that a third party has made a book available in a certain format and through a certain DRM scheme.
This class represents a document associated with this book; usually a cover image, a description, or an electronic copy of the book. The assumption is that this file can be downloaded by anyone without authentication, and opened up without any special tools. Which is to say, this is not how DRM-encrypted books are delivered.
-
href
: The URL to the file. -
media_type
: The media type of the file. -
content
: In lieu of a URL, you can provide the actual content of a file. This is most often used for textual descriptions of a book. -
rel
: The relationship between the book and the document, or 'link relation'. TheHyperlink
class in core/model.py defines a number of link relations, the most important beingIMAGE
(for a cover image),DESCRIPTION
(for a textual description), andOPEN_ACCESS_DOWNLOAD
(for a freely available copy of a book). -
thumbnail
: If this is an image, and you also know the URL of a smaller version of the image, you can create aLinkData
object for the smaller version and make it thethumbnail
of the full-size image.
Sometimes data sources take measurements of a book's quality, popularity, reading level, or some other numeric value. These values can go into MeasurementData
objects.
You can store as much data here as you want, but to get it to actually do something you'll need to change one of the methods in the Measurement
class in core/model.py. This stuff has to be calibrated. To take an obvious example, Amazon's "sales rank" and Overdrive's "popularity" both measure the same quantity (popularity), but a low sales rank means a book is very popular while a low Overdrive popularity means the opposite.
-
quantity_measured
: The quantity being measured. Some popular measurements are defined in theMeasurement
class. Again, adding a new type of measurement (or even an existing measurement from a new source) won't do anything. The measurement values have to be calibrated, which requires changing theMeasurement
class. -
value
: The value of the measurement. -
weight
: How confident you are in the accuracy of the measurement, relative to other measurements of the same quantity from the same source. This should be a number from 0..1. -
taken_at
: The date at which the measurement was taken. In general, only the most recent measurement is considered.
When you create a Metadata
or CirculationData
object, you've captured the current state of affairs as described on some remote server. Now you need to make the local database reflect that state of affairs. The apply
method will take your metadata-layer object and create or modify the necessary database objects to make the database reflect what the third party says.
-
collection
is aCollection
object representing the collection that contains the book you're talking about. If necessary,apply
will create aLicensePool
in that collection representing the fact that this book is available in this collection. -
replace
is aReplacementPolicy
object (see below).
-
edition
is anEdition
object representing bibliographic information about this book. You can get an appropriate edition by callingedition(_db)
on theMetadata
object. -
collection
is aCollection
object representing the collection that contains the book you're talking about. If you're talking about the book in abstract terms, rather than any particular copies of the book, you can leave this blank.
When you call an apply
method, you're supposed to pass in a
ReplacementPolicy
object as well. This object sets the rules for when a value overrides a previously existing value in the database, versus when it extends previously existing values in the database.
In practice, the details don't matter much. When you're dealing with actual copies of the books (and thus, with CirculationData
), you probably want to use ReplacementPolicy.from_license_source
. When you're just editing information about a book (so you have a Metadata
but no CirculationData
), you want to use ReplacementPolicy.from_metadata_source
.
TBD