Zephir 880 Dataset and Analysis

This repository contains scripts to generate an 880 dataset for analysis. HathiTrust is analyzing all volume records that include an 880-245 and require data be extracted from Zephir for this analysis. There are two scripts that generate this dataset.

Requirements

Python 3.12+
Poetry
Database credentials in a .env file

880 Volume Dataset and Analysis

Script: generate_880_volume_dataset.py
Outupt: 880_volumes_dataset.csv
Analysis: 880_volumes_analysis.ipynb (Jupyter Notebook) Very limited analysis

Generates a dataset of volumes that contain an 880-245. While it can be used to count all volumes with 880-245 in the supplied record, it's primary purpose is to build the final dataset. The dataset includes the following columns:

cid
namespace,
contribsys_id
htid
language
var_usfeddoc
var_score
vufind_sort

880 Record Dataset

Script: generate_880_record_dataset.py
Output: 880_records_dataset.tsv
Analysis: 880_records_analysis.ipynb (Jupyter Notebook)

Generates a dataset of records with 880-245 fields. The script uses the output from the volume dataset, and makes the following changes.

Removes duplicates of the same record, which is common when one contributor has multiple volumes with the same bibliographic record. It achieves this by keeping only one entry per contributor system ID (ILS number).
Calculates the expected bibliographic data selection order Zephir will use when building a combined cluster record.
Adds the title field and associated 880 field. If there is an error in the 880 field linking, the script will use null values for the 880.

The dataset includes the following columns:

cid
namespace
contribsys_id
htid
language
selection order
title
880

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
880_records_analysis.ipynb		880_records_analysis.ipynb
880_volumes_analysis.ipynb		880_volumes_analysis.ipynb
README.md		README.md
create_880_records_dataset.py		create_880_records_dataset.py
create_880_volumes_dataset.py		create_880_volumes_dataset.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zephir 880 Dataset and Analysis

Requirements

880 Volume Dataset and Analysis

880 Record Dataset

About

Releases

Packages

Languages

cdlib/zephir_880_dataset

Folders and files

Latest commit

History

Repository files navigation

Zephir 880 Dataset and Analysis

Requirements

880 Volume Dataset and Analysis

880 Record Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages