This package contains scripts that are used to import data from vendor databases, and to curate imported data.
The general process of importing data from a vendor is as follows:
- Check for a local cache of vendor records (to avoid downloading data too frequently).
- If the cache is found, load the records. If not, download all vendor data in batches, making no more than 100 requests simultaneously.
- Transform the vendor records into "source" records.
- Retrieve the list of records for the vendor source(s) and compare them to the newly-retrieved records. If there are new records, create them. If there are updated versions of existing records, update them.
Through the web interface and various API endpoints, by default only "unified" records are displayed. The "unifier" is a script that looks for any "source" records that are not already associated with a "unified" record. For each of these "orphaned" records it finds, the "unifier":
- Creates a new "unified" record using the same "title", "description", "manufacturer", etc. data from the "source" record.
- Updates the "source" record to indicate that it is associated with the new "unified" record.
There is a secondary set of scripts that:
- Extract image data from "source" records.
- Download the original images and cache them locally as needed.
- Update the associated "unified" record to indicate that it has image data.
For this script to work properly, it should be run after the "unifier" has completed its work (see below).
The basic process for performing a full sync is as follows:
- Run each of the "core" imports.
- Run the "unifier".
- Run each of the "image" imports.
This package has a few npm
scripts defined to cover each part of the above. To run a "full" sync, use a command like:
npm run full-sync
For individual commands, see the package.json
file. Note that all of the npm scripts are hard-coded to use the
"production" configuration settings. To run in a dev environment, you will need to run the direct commands (see below).
NOTE: The GARI imports are no longer included in a "full" sync.
This package provides a range of scripts that use fluid-launcher
to
run various commands, and to change the effective options used when running the command using options files,
command-line switches, and environment variables. The configs
directory contains a range of configuration files.
Usually, you will want to work with one of the "merged" config files that combines:
- Options common to all scripts.
- Options common to the operating environment (production, development).
- Options common to all scripts that work with a particular vendor's data (EASTIN, GARI, et cetera).
- Options specific to a particular script.
All commands provided below are intended to be run from root of the directory in which the repository has been checked out.
The Unified Listing pulls "source" records from EASTIN, which is itself a federation of various partner databases (see their website for details).
To import "core" data from EASTIN:
- In production:
npm run eastin-import
- In a development environment:
npm run eastin-import -- --optionsFile %ul-imports/configs/eastin-dev.json
To import "image" data from EASTIN:
- In production:
npm run eastin-image-import
- In a development environment:
npm run eastin-image-import -- --optionsFile %ul-imports/configs/eastin-image-sync-dev.json
The Unified Listing pulls "source" records from GARI. To import "core" data from GARI:
- In a production environment:
npm run gari-import
- In a development environment:
npm run gari-import -- --optionsFile %ul-imports/configs/gari-dev.json
To import "image" data from GARI:
- In production:
npm run gari-image-import
- In a development environment:
npm run gari-image-import -- --optionsFile %ul-imports/configs/gari-image-sync-dev.json
The Shopping and Alerting Aid is a front-end to the Unified Listing to assist users in finding products that meet their needs. To import "core" data from the SAI:
- In production:
npm run sai-import
- In a development environment:
npm run sai-import -- --optionsFile %ul-imports/configs/sai-dev.json
To import "image" data from the SAI:
- In production:
npm run sai-image-import
- In a development environment:
npm run sai-image-import -- --optionsFile %ul-imports/configs/sai-image-sync-dev.json
See above for details regarding the "unifier". To create a "unified" record for each "source" record that is not already associated with a "unified" record:
- In production:
npm run unifier
- In a development environment:
npm run unifier -- --optionsFile %ul-imports/configs/unifier-dev.json
This package includes scripts that can be used to detect and where possible clean up particular problems with imported
data. For details, look at the contents of the src/js/curation
directory.
This package includes scripts that make key updates to "unified" records based on data coming from the SAI:
- For each SAI record that has been flagged as
deleted
in the SAI, the unified record is updated to indicate that it has been deleted. - For each SAI record that has been flagged as a "duplicate" of another ("original") record:
- The associated unified record is updated to indicate that it has been deleted, and to redirect future requests to the "original" record.
- All "child" records associated with a "duplicate" record are updated to be associated with the "original" record instead.
- For each SAI record that has an updated
name
ordescription
, the associated unified record will have itsname
anddescription
updated.
Note that image data associated with "duplicate" records is not migrated.
To run these scripts in production, use the command: npm run sai-curation
Please note, the curation script does not directly reload the data from the SAI API. Instead, it compares the SAI "source" records that have already been imported into the UL with their associated unified records. To pick up new data, you will need to first run the SAI import or a full sync (see above for details on both).
The scripts in this package use the fluid-launcher
package to allow
you set options from the command-line or environment variable. For example, let's say you want to run a "full sync"
with a custom password. You can do this using an argument, or an environment variable. With an argument, you might
use a command like:
npm run full-sync -- --password myPasswordValue
Note the --
characters are required to help npm understand where its arguments end and where the script's arguments
begin.
Using an environment variable to set the same password, you might use a command like:
PASSWORD=myPasswordValue npm run full-sync
For more details about supported options, run any of the scripts in this package with the --help
argument. You can
also create your own custom options files and use those instead of the included files. For more details, see the
fluid-launcher
documentation.
Each import that is run saves data on updated records to a file. These files can be used to generate emails describing each updated record, and also a rollup HTML report that summarises all the changes for a given import. This process involves two steps:
- Look for unprocessed import output. Compare the raw updates for a given source to the original records and produce a "diff" data file. Import output is gzipped and archived at the end of this process.
- Look for unprocessed "diff" data files. Generate emails and an HTML report for each file. Incoming "diff" data files are then gzipped and archive the file.
There are npm scripts provided in this package to handle both steps. There is a rollup script to run these in order,
which is run using a command like npm run updates-report
. You can also run the steps individually, see the
package.json
file for details.
The process for making code changes within the UL API team is generally as follows:
- Create an issue in GitHub to track the work.
- Discuss the issue with the team and agree on whether/how to proceed.
- Fork the code locally if you have not already done so.
- Create a local branch whose name matches the GitHub issue. If the issue number is #3, the branch should be
GH-3
. - Make the changes agreed in step 2. All commit messages should be prefixed with the branch/issue details, i.e.
GH-3: Updated key dependencies
. - Create a pull request comparing your fork/branch to
main
. - Discuss the review with the team and agree on how/when it will be reviewed.
- Work together with reviewer(s) to address any feedback.
A pull request is more likely to be accepted if it:
- Makes a small, clearly identified, and well described change.
- Does not break any tests.
- Includes tests for any new functionality, so that the code coverage remains high.
- Passes all linting checks. This sometimes involves reformatting code to match the existing style. It can also involve discussing and making changes to the linting checks.