Skip to content

Repo for the Tanagra service being developed by the All of Us DRC

License

Notifications You must be signed in to change notification settings

DataBiosphere/tanagra

Repository files navigation

Tanagra

Tanagra is a project to build a configurable cohort builder and data explorer. Our goal is to make it easy to set up a new dataset for exploring with little or no custom code required, so everything we've built is configuration-driven.

Project overview

The project has three main pieces: indexer, service, UI. All three pieces are highly interconnected and are not intended to be used or deployed separately. Everything lives in this single GitHub repository.

The indexer takes the source dataset and produces a logical copy that's better suited to the types of queries the UI needs to run. It denormalizes some data, precomputes some things, and reorganizes tables. The goal is not to meet some query benchmark, only to have the UI not time out.

The service processes queries for the UI and manages the application database, which stores user-managed artifacts like cohorts and data feature sets.

The UI includes the cohort builder, data feature set builder, export, and cohort review interfaces.

Configure a new dataset

Tanagra supports data patterns, rather than specific SQL schemas. Check the list of currently supported patterns to see how they map to your dataset.

Tanagra defines a custom object model on top of the underlying relational data. The dataset configuration language is based on this object model, so it's helpful to be familiar with the main concepts.

A dataset configuration is spread across multiple files, to improve readability and allow easier sharing across datasets. See an overview of the different files and directory structure, as well as pointers to example files. Check the full dataset configuration schema documentation to lookup specific properties. Documentation for protocol buffers used for visualizations and criteria plugins is here.

Set up a new deployment

Choose a deployment pattern and configure the GCP project(s).

Once you've defined the configuration files for a dataset, run the indexer. Check the full indexer CLI documentation to lookup specific commands.

Tanagra does not provide an API for managing access control for a population of users. Instead, we provide an interface for calling an external access control service. (e.g. The VUMC admin service serves as the external access control service for the SD deployment.) Either reuse an existing access control implementation, or add your own.

We expect deployments to require varied methods of exporting data. Either reuse an existing export implementation, or add your own.

Check the full application configuration documentation to lookup specific deployment properties.

Once your deployment is up and running, create a regression test suite to detect unexpected changes due to config or underlying data changes and run it regularly.

Manage releases

Tanagra supports multiple deployments, all with different release cadences. See more details about the codebase versioning and release process, and how you can manage the version for a specific deployment.

Use this tool to diff two release tags, when you're planning on bumping a deployment to a newer version of this codebase.

Contribute to the codebase

Check the guidelines for developers, including instructions for getting things running locally on your machine.

See an overview of the codebase structure, and information specifically about the UI.

All documentation links

These are all linked in the sections above. This is just in list format if you already know what you're looking for.

Project overview

Configure a new dataset

Set up a new deployment

Manage releases

Contribute to the codebase

Codebase test status

Underlay Tests

Indexer Tests

Service (PostGres) Tests

Service (MariaDB) Tests

UI Tests

UI Integration Tests

Generated Files