Skip to content

datopian/ckanext-versioning

Repository files navigation

CKAN + data versioning 🚀. This CKAN extension adds a full data versioning capability to CKAN including:

  • Metadata and data is revisioned so that all updates create new revision and old versions of the metadata and data are accessible
  • Create and manage releases - named labels plus a description for a specific revision of a dataset, e.g. "v1.0". These are similar in concept to VCS tags.
  • Diffs, reverting etc

For more background see https://tech.datopian.com/versioning/

Requirements

ckanext-verisoning requires CKAN 2.8.4 or a newer version of CKAN 2.8. It may work with CKAN 2.9 as well but this is currently not tested.

Installation

To install ckanext-versioning:

  1. Activate your CKAN virtual environment, for example:

    . /usr/lib/ckan/default/bin/activate
    
  2. Install the ckanext-versioning Python package into your virtual environment:

    pip install ckanext-versioning
    
  3. Add package_versioning to the ckan.plugins setting in your CKAN config file (by default the config file is located at /etc/ckan/default/production.ini).

  4. Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu:

    sudo service apache2 reload
    

Configuration settings

The following CKAN INI configuration settings are required for this plugin to operate properly:

ckanext.versioning.backend_type

Should be set to a valid metastore-lib backend type, for example:

ckanext.versioning.backend_type = filesystem

ckanext.versioning.backend_config

Should be a Python dictionary containing configuration options to pass to the metastore-lib backend factory. The specific configuration options accepted for each backend are documented here.

For example, for the filesystem backend one can use:

ckanext.versioning.backend_config = {"uri":"./metastore"}

To set the metadata storage path to ./metastore on the local file system.

API Actions

This extension exposes a number of new API actions to manage and use dataset revisions and releases.

The HTTP method is GET for list / show actions and POST for create / delete actions.

You will need to also pass in authentication information such as cookies or tokens - you should consult the CKAN API Guide <https://docs.ckan.org/en/2.8/api/>_ for details.

The following curl examples all assume the $API_KEY environment variable is set and contains a valid CKAN API key, belonging to a user with sufficient privileges; Output is indented and cleaned up for readability.

dataset_release_list

List releases for a dataset.

HTTP Method: GET

Query Parameters:

  • dataset=<dataset_id> - The UUID or unique name of the dataset (required)

Example:

$ curl -H "Authorization: $API_KEY" \
  https://ckan.example.com/api/3/action/dataset_release_list?dataset=my-awesome-dataset

{
  "help": "http://ckan.example.com/api/3/action/help_show?name=dataset_release_list",
  "success": true,
  "result": [
    {
      "id": "5942ab7a-67cb-426c-ad99-dd4519530bc7",
      "package_id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
      "package_revision_id": "7316fb6c-07e7-43b7-ade8-ac26c5693e6d",
      "name": "Version 1.2",
      "description": "Updated to include latest study results",
      "creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
      "created": "2019-10-27 15:29:53.452833"
    },
    {
      "id": "87d6f58a-a899-4f2d-88a4-c22e9e1e5dfb",
      "package_id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
      "package_revision_id": "1b9fc99e-8e32-449e-85c2-24c893d9761e",
      "name": "Corrected for inflation",
      "description": "With Avi Bitter",
      "creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
      "created": "2019-10-27 15:29:16.070904"
    },
    {
      "id": "3e5601e2-1b39-43b6-b197-8040cc10036e",
      "package_id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
      "package_revision_id": "e30ba6a8-d453-4395-8ee5-3aa2f1ca9e1f",
      "name": "Version 1.0",
      "description": "Added another resource with index of countries",
      "creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
      "created": "2019-10-27 15:24:25.248153"
    }
  ]
}

dataset_release_show

Show info about a specific dataset release.

Note that this will show the release information - not the dataset metadata or data (see package_show_release_)

HTTP Method: GET

Query Parameters:

  • id=<dataset_release_id> - The UUID of the release to show (required)

Example:

$ curl -H "Authorization: $API_KEY" \
  https://ckan.example.com/api/3/action/dataset_release_show?id=5942ab7a-67cb-426c-ad99-dd4519530bc7

{
  "help": "http://ckan.example.com/api/3/action/help_show?name=dataset_release_show",
  "success": true,
  "result": {
    "id": "5942ab7a-67cb-426c-ad99-dd4519530bc7",
    "package_id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
    "package_revision_id": "7316fb6c-07e7-43b7-ade8-ac26c5693e6d",
    "name": "Version 1.2",
    "description": "Updated to include latest study results",
    "creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
    "created": "2019-10-27 15:29:53.452833"
  }
}

dataset_release_create

Create a new release for the specified dataset current revision. You are required to specify a name for the release, and can optionally specify a description.

HTTP Method: POST

JSON Parameters:

  • dataset=<dataset_id> - UUID or name of the dataset (required, string)
  • name=<release_name>`` - Name for the release. Release names must be unique per dataset (required, string)
  • description=<description> - Long description for the release; Can be markdown formatted (optional, string)

Example:

$ curl -H "Authorization: $API_KEY" \
       -H "Content-type: application/json" \
       -X POST \
       https://ckan.example.com/api/3/action/dataset_release_create \
       -d '{"dataset":"3b5a4f83-8770-4e8c-9630-c8abf6aa20f4", "name": "Version 1.3", "description": "With extra Awesome Sauce"}'

{
  "help": "https://ckan.example.com/api/3/action/help_show?name=dataset_release_create",
  "success": true,
  "result": {
    "id": "e1a77b78-dfaf-4c05-a261-ff01af10d601",
    "package_id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
    "package_revision_id": "96ad6e02-99cf-4598-ab10-ea80e864e505",
    "name": "Version 1.3",
    "description": "With extra Awesome Sauce",
    "creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
    "created": "2019-10-28 08:14:01.953796"
  }
}

dataset_release_delete

Delete a dataset release. This does not delete the dataset revision, just the named release pointing to it.

HTTP Method: POST

JSON Parameters:

  • id=<dataset_release_id> - The UUID of the release to delete (required, string)

Example::

$ curl -H "Authorization: $API_KEY" \
       -H "Content-type: application/json" \
       -X POST \
       https://ckan.example.com/api/3/action/dataset_release_delete \
       -d '{"id":"e1a77b78-dfaf-4c05-a261-ff01af10d601"}'

{
  "help": "https://ckan.example.com/api/3/action/help_show?name=dataset_release_delete",
  "success": true,
  "result": null
}

package_show_release

Show a dataset (AKA package) in a given release. This is identical to the built-in package_show action, but shows dataset metadata for a given release, and adds some versioning related metadata.

This is useful if you've used dataset_release_list to get all named releases for a dataset, and now want to show that dataset in a specific release.

If release_id is not specified, the latet release of the dataset will be returned, but will include a list of releases for the dataset.

HTTP Method: GET

Query Parameters:

  • id=<dataset_id> - The name or UUID of the dataset (required)
  • release_id=<release_id> - A release name to show (optional)

Examples:

Fetching dataset metadata in a specified release:

$ curl -H "Authorization: $API_KEY" \
       'https://ckan.example.com/api/3/action/package_show_release?id=3b5a4f83-8770-4e8c-9630-c8abf6aa20f4&release_id=5942ab7a-67cb-426c-ad99-dd4519530bc7'

{
  "help": "https://ckan.example.com/api/3/action/help_show?name=package_show_release",
  "success": true,
  "result": {
    "maintainer": "Bob Paulson",
    "relationships_as_object": [],
    "private": true,
    "maintainer_email": "",
    "num_releases": 2,

    "release_metadata": {
      "id": "5942ab7a-67cb-426c-ad99-dd4519530bc7",
      "package_id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
      "package_revision_id": "7316fb6c-07e7-43b7-ade8-ac26c5693e6d",
      "name": "Version 1.2",
      "description": "Without Avi Bitter",
      "creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
      "created": "2019-10-27 15:29:53.452833"
    },

    "id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
    "metadata_created": "2019-10-27T15:23:50.612130",
    "owner_org": "68f832f7-5952-4cac-8803-4af55c021ccd",
    "metadata_modified": "2019-10-27T20:14:42.564886",
    "author": "Joe Bloggs",
    "author_email": "",
    "state": "active",
    "version": "1.0",
    "type": "dataset",
    "resources": [
      {
        "cache_last_updated": null,
        "cache_url": null,
        "mimetype_inner": null,
        /// ... standard resource attributes ...
      }
    ],
    "num_resources": 1,

    /// ... more standard dataset attributes ...
  }
}

Note the release_metadata, which is only included with dataset metadata if the release_id parameter was provided.

Fetching the current revision of dataset metadata in a specified release:

{
  "help": "https://ckan.example.com/api/3/action/help_show?name=package_show_release",
  "success": true,
  "result": {
    "license_title": "Green",
    "relationships_as_object": [],
    "private": true,
    "id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
    "metadata_created": "2019-10-27T15:23:50.612130",
    "metadata_modified": "2019-10-27T20:14:42.564886",
    "author": "Joe Bloggs",
    "author_email": "",
    "state": "active",
    "release": "1.0",
    "creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
    "type": "dataset",
    "resources": [
      {
        "mimetype": "text/csv",
        "cache_url": null,
        "hash": "",
        "description": "",
        "name": "https://data.example.com/dataset/287f7e34-7675-49a9-90bd-7c6a8b55698e/resource.csv",
        "format": "CSV",
        /// ... standard resource attributes ...
      }
    ],
    "num_resources": 1,
    "releases": [
      {
        "vocabulary_id": null,
        "state": "active",
        "display_name": "bar",
        "id": "686198e2-7b9c-4986-bb19-3cf74cfe2552",
        "name": "bar"
      },
      {
        "vocabulary_id": null,
        "state": "active",
        "display_name": "foo",
        "id": "82259424-aec6-428c-a682-0b3f6b8ee67d",
        "name": "foo"
      }
    ],

    "releases": [
      {
        "id": "5942ab7a-67cb-426c-ad99-dd4519530bc7",
        "package_id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
        "package_revision_id": "7316fb6c-07e7-43b7-ade8-ac26c5693e6d",
        "name": "Version 1.2",
        "description": "Fixed some inaccuracies in data",
        "creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
        "created": "2019-10-27 15:29:53.452833"
      },
      {
        "id": "87d6f58a-a899-4f2d-88a4-c22e9e1e5dfb",
        "package_id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
        "package_revision_id": "1b9fc99e-8e32-449e-85c2-24c893d9761e",
        "name": "version 1.1",
        "description": "Adjusted for country-specific inflation",
        "creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
        "created": "2019-10-27 15:29:16.070904"
      }
    ],

    /// ... more standard dataset attributes ...
  }
}

Note the releases list, only included when showing the latest dataset release via package_show_release.

Config Settings

This extension does not provide any additional configuration settings.

Development Installation

To install ckanext-versioning for development, activate your CKAN virtualenv and do:

git clone https://github.com/datopian/ckanext-versioning.git
cd ckanext-versioning
python setup.py develop
pip install -r dev-requirements.txt

Running the Tests

To run the tests, do:

make test
make test TEST_PATH=test_file.py # to run all the tests of a specific file.
make test TEST_PATH=test_file.py:Class # to run all the tests of a specific Class.
make test TEST_PATH=test_file.py:Class.test_name # to execute a specific test.

To run the tests and produce a coverage report, first make sure you have coverage installed in your virtualenv (pip install coverage) then run:

make test coverage

Note that for tests to run properly, you need to have this extension installed in an environment that has CKAN installed in it, and configured to access a local PostgreSQL and Solr instances.

You can specify the path to your local CKAN installation by adding:

make test CKAN_PATH=../../src/ckan/

For example.

In addition, the following environment variables are useful when testing:

CKAN_SQLALCHEMY_URL=postgres://ckan:ckan@my-postgres-db/ckan_test
CKAN_SOLR_URL=http://my-solr-instance:8983/solr/ckan