The script's purpose is to check each record for existing MARC 260/264 subfield $a/$e values and 008 MARC Country Code value against a dataframe version of the RBMS/BSC Latin Place Names File, supply normalized geographic data in the MARC 752 field when possible, and output the results into three separate spreadsheets for review. Due to the flat structure of MARC, there are limitations to the use cases that can be handled strictly through an automated process. Use this script in consultation with rare materials catalogers and tweak the script according to your local cataloging policies regarding implementation of the MARC 752 field.
MARC record evaluation and dataframe querying takes place before a record is sent on to one of the Python script’s three PyMARC writers. The evaluation process splits the original MARC file into one of three possible paths based on the data contained within the records.
The script identifies MARC records with MARC 752s already present and writes them to a file for manual review and possible URI enrichment.
The Python script assigns the MARC Country Code in the 008 field and the MARC 260/264 subfield $a to variables. Then, the script queries the pandas dataframe to find a relevant match. If a match within the dataframe exists, the script compiles the subfield components for the MARC 752 field and the script adds the field to the MARC record.
By design, the script does not add a MARC 752 field for some use cases. They can be broadly categorized into the following groups:
- The script identifies potential errors present in the MARC record.
- The script identifies an imprint location that could refer to several places and the cataloger will need to evaluate the record manually and in greater context.
- The script can not identify the imprint place.
- The script user needs to add additional imprint information to the CSV file.
- The MARC record lists two places of imprint.
- The script identifies evidence of a reproduction. For these use cases, a rare materials cataloger needs to resolve and verify, in further consultation with bibliographic utilities and with research on the typical locations of the publisher, manufacturer, etc. by a manual process. Where possible, the script will broadly identify the reason a 752 field was not added.
This csv file is fed into the script as a pandas (https://pandas.pydata.org/index.html) dataframe. It contains all the RBMS/BSC Latin Place Names File (http://rbms.info/lpn/) as well as additional entries from rare materials bibliographic practice. Each row within the CSV file is also populated with the various values comprising the MARC 752 subfield components, including the country ($a), the first-order political jurisdiction (i.e. state or province) ($b), city ($d), relator term ($e), relator term URI ($4), and Real World Object URI ($1). The script user feeds additional locations and place name variations into the script for processing making for a flexible, locally customizable, and responsive design. Since rare materials can have innumerable variety in the transcription of imprint locations, the MasterChartcleaner list will never be exhaustive, but rather is designed to be an iterative solution.
The CSV version of the Latin Place Names File has also been enriched with data from other bibliographic utilities. Please report any errors in the CSV to [email protected] or open an issue.
This binary MARC file is supplied for testing purposes.
The Python script publishes additional pandas (https://pandas.pydata.org/index.html) dataframes as CSV files. The CSV files are meant for human consumption and at a glance can help further diagnose errors within the MARC records and/or verify various components of the MARC 260/264/752 fields. The DAT files are formatted as UTF-8 binary MARC.