Skip to content

chrpr/ead2rdf2solr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

bfe37c4 · Oct 1, 2015

History

42 Commits
Feb 16, 2013
Feb 13, 2013
Jun 6, 2013
Feb 13, 2013
Feb 13, 2013
Feb 10, 2013
Feb 13, 2013
Feb 13, 2013
Feb 15, 2013
Feb 14, 2013
Feb 13, 2013
Feb 10, 2013

Repository files navigation

ead2rdf2solr

A first release of some code that manages a modified BlackLight SOLR full of dis-aggregated EAD collections & components, including records for the "Entities" described in either origination or controlaccess elements of the ead. Enriches these entities with data from DBPedia, and soon VIAF, id.loc.gov, fast & others. Stores this heterogeneous data in 4store, then appends at SOLRization stage using SPARQL. Also contains functionality for generating .ttl files for use with content negotiation.

Installation

See requirements.txt for a list of library dependencies. You can use this to bootstrap your environment with virtualenv like this (presuming you already have virtualenv):

% git clone https://github.com/chrpr/ead2rdf2solr.git
% virtualenv --no-site-packages ENV
(ENV)% source ENV/bin/activate
(ENV)% pip install -r requirements.txt

Usage

Look for usage instructions & some examples here soon. For now, there's a single command line script (ead2rdf.py) that manages a variety of transformations. Among the first issues I want to tackle is getting this into an argument driven format, rather than the commenting out various chunks for running different sub-processes.

PyPi Dependencies

(pip install $LIB) lxml rdflib sunburnt HTTP4Store nltk fuzzywuzzy

Running Tests

python -m unittest discover

Unauthenticated editing?