Kumquat is an implementation of the Illinois Digital Special Collections Service.
This is a getting-started guide for developers.
- AWS CLI with awscli-login plugin
- Administrator access to the production DSC instance (in order to export data to import into your development instance)
- PostgreSQL >= 9.x
- An S3 server (MinIO Server, s3proxy, and SeaweedFS will all work in development & test)
- OpenSearch >= 1.0
- The
analysis-icu
plugin must also be installed.
- The
- Cantaloupe 5.0+
- You can install and configure this yourself, but it will be easier to run a DSC image server container in Docker.
- This will also require the
AWS Command Line Interface v1.x with the
awscli-login
plugin, which is needed to obtain credentials for Cantaloupe to access
the relevant S3 buckets. (awscli-login requires v1.x of the CLI as of
this writing, but that would be the only reason not to upgrade to v2.x.
It's also possible to install 1.x, then rename the
aws
executable to something likeaws-v1
, then install 2.x, and only useaws-v1 login
for logins.)
- exiv2 (used to extract image metadata)
- ffmpeg (used to extract video metadata)
- tesseract (used for OCR)
# Install rbenv
$ brew install rbenv
$ brew install ruby-build
$ brew install rbenv-gemset
$ rbenv init
$ rbenv rehash
# Clone the repository
$ git clone https://github.com/medusa-project/kumquat.git
$ cd kumquat
# Install Ruby into rbenv
$ rbenv install "$(< .ruby-version)"
# Install Bundler
$ gem install bundler
# Install application gems
$ bundle install
Uncomment discovery.type: single-node
in config/opensearch.yml
. Also add
the following lines:
plugins.security.disabled: true
plugins.index_state_management.enabled: false
reindex.remote.whitelist: "localhost:*"
$ bin/opensearch-plugin install analysis-icu
$ bin/opensearch
To confirm that it's running, try to access http://localhost:9200.
Obtain demo.key
and production.key
from a team member and put them in
config/credentials
. Then:
$ cd config/credentials
$ cp template.yml development.yml
$ cp template.yml test.yml
Edit both as necessary.
See the "Configuration" section later in this file for more information about the configuration system.
$ bin/rails "opensearch:indexes:create[kumquat_development_blue]"
$ bin/rails "opensearch:indexes:create_alias[kumquat_development_blue,kumquat_development]"
$ bin/rails db:setup
$ bin/rails server
Kumquat should now be available at http://localhost:3000.
N.B.: If this command crashes on macOS, try adding the following line to your
.zshrc
file:
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
Run this command TWICE:
$ bin/rails collections:sync
After the second invocation has completed, it only has to be run once from now on.
(From here on, we'll deal with the Champaign-Urbana Historic Built Environment collection.)
- Go to the element list on the production instance.
- Click the Export button to export it to a file.
- On your local instance, go to the element list and import the file.
(Log in as
super
/[email protected]
.)
- Go to the vocabulary list on the production instance.
- Click the "BT" vocabulary.
- Click the Export button to export it to a file.
- On your local instance, go to the vocabulary list and import the file.
- Repeat for the "provider," "resourceType," and "rights" vocabularies.
- Go to the collection's metadata profile in the production instance.
- Click the Export button to export it to a file.
- On your local instance, go to the metadata profiles list and click the Import button to import the file.
- On your local instance, go to the collection's admin view.
- In the Technical Info tab, click Edit.
- Copy the settings from the production instance:
- Set the File Group ID to
b3576c20-1ea8-0134-1d77-0050569601ca-6
. - Set the Package Profile to "Single-Item Object."
- Set the Metadata Profile to the profile you just imported.
- Save the collection.
- Set the File Group ID to
- In the Access tab, click Edit and set the collection as published.
- On the command line, log into AWS:
aws login
- Go to the admin view of the collection.
- Click the "Objects" button.
- Click the "Import" button.
- In the "Import Items" panel, make sure "Create" is checked, and click "Import." This will invoke a background job. Wait for it to complete. You can track its progress in tasks view.
- Go to the collection's admin view in production.
- Click the "Objects" button.
- Click "Metadata -> Export As TSV" and export all items to a file.
- Go to the same view on your local instance.
- Import the TSV. This will invoke a background job. When it finishes, the collection should be fully populated with metadata.
$ bin/rails db:migrate
Once created, index schemas can only be modified to a limited extent. To migrate to an incompatible schema, the procedure would be:
- Update the index schema in
app/search/index_schema.yml
- Create a new index with the new schema:
rails "opensearch:indexes:create[new_index]"
- Populate the new index with documents. There are a couple of ways to do
this:
- If the schema change was backwards-compatible with the source documents
added to the index, invoke
rails "opensearch:indexes:reindex[current_index,new_index]"
. This will reindex all source documents from the current index into the new index. - Otherwise, reindex all database content:
$ rails collections:reindex $ rails agents:reindex $ rails items:reindex
- If the schema change was backwards-compatible with the source documents
added to the index, invoke
There are several dependent services:
- PostgreSQL
- OpenSearch
- A working Medusa Collection Registry. There are a lot of tests that rely on fixture data within Medusa. Unfortunately, production Medusa data is not stable enough to test against and it's hard to tailor for specific tests that require specific types of content. So instead, all of the tests rely on a mock of Medusa called Mockdusa.
- A Cantaloupe image server instance.
- Three S3 buckets:
- One for the Cantaloupe cache.
- One for application data.
- One containing Medusa repository data. The content exposed by Mockdusa, above, should be available in this bucket.
Due to the hassle of getting all of this running locally, there is also a
docker-compose.yml
file that will spin up all of the required services and
run the tests within a containerized environment:
aws login
eval $(aws ecr get-login --region us-east-2 --no-include-email --profile default)
docker-compose pull && docker-compose up --build --exit-code-from kumquat
See the class documentation in app/config/configuration.rb
for a detailed
explanation of how the configuration system works. The short explanation is
that the develop
and test
environments rely on the unencrypted
config/credentials/develop.yml
and test.yml
files, respectively, while the
demo
and production
environments rely on the demo.yml.enc
and
production.yml.enc
files, which are Rails 6 encrypted credentials files.
In the production and demo environments, authorization uses the campus Active Directory via LDAP. In development and test, there is one "canned user" for each Medusa AD group:
user
: Library Medusa Usersadmin
: Library Medusa Adminssuper
: Library Medusa Super Admins
Sign in with any of these using [username]@example.org
as the password.
The rake doc:generate
command invokes YARD to generate HTML documentation
for the code base.