CKAN + data versioning 🚀. This CKAN extension adds a full data versioning capability to CKAN including:
- Metadata and data is revisioned so that all updates create new revision and old versions of the metadata and data are accessible
- Create and manage releases - named labels plus a description for a specific revision of a dataset, e.g. "v1.0". These are similar in concept to VCS tags.
- Diffs, reverting etc
For more background see https://tech.datopian.com/versioning/
ckanext-verisoning requires CKAN 2.8.4 or a newer version of CKAN 2.8. It may work with CKAN 2.9 as well but this is currently not tested.
To install ckanext-versioning:
-
Activate your CKAN virtual environment, for example:
. /usr/lib/ckan/default/bin/activate
-
Install the ckanext-versioning Python package into your virtual environment:
pip install ckanext-versioning
-
Add
package_versioning
to theckan.plugins
setting in your CKAN config file (by default the config file is located at/etc/ckan/default/production.ini
). -
Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu:
sudo service apache2 reload
The following CKAN INI configuration settings are required for this plugin to operate properly:
Should be set to a valid metastore-lib backend type, for example:
ckanext.versioning.backend_type = filesystem
Should be a Python dictionary containing configuration options to pass to the metastore-lib backend factory. The specific configuration options accepted for each backend are documented here.
For example, for the filesystem
backend one can use:
ckanext.versioning.backend_config = {"uri":"./metastore"}
To set the metadata storage path to ./metastore
on the local file system.
This extension exposes a number of new API actions to manage and use dataset revisions and releases.
The HTTP method is GET for list / show actions and POST for create / delete actions.
You will need to also pass in authentication information such as cookies or
tokens - you should consult the CKAN API Guide <https://docs.ckan.org/en/2.8/api/>
_ for details.
The following curl
examples all assume the $API_KEY
environment
variable is set and contains a valid CKAN API key, belonging to a user with
sufficient privileges; Output is indented and cleaned up for readability.
List releases for a dataset.
HTTP Method: GET
Query Parameters:
dataset=<dataset_id>
- The UUID or unique name of the dataset (required)
Example:
$ curl -H "Authorization: $API_KEY" \
https://ckan.example.com/api/3/action/dataset_release_list?dataset=my-awesome-dataset
{
"help": "http://ckan.example.com/api/3/action/help_show?name=dataset_release_list",
"success": true,
"result": [
{
"id": "5942ab7a-67cb-426c-ad99-dd4519530bc7",
"package_id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
"package_revision_id": "7316fb6c-07e7-43b7-ade8-ac26c5693e6d",
"name": "Version 1.2",
"description": "Updated to include latest study results",
"creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
"created": "2019-10-27 15:29:53.452833"
},
{
"id": "87d6f58a-a899-4f2d-88a4-c22e9e1e5dfb",
"package_id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
"package_revision_id": "1b9fc99e-8e32-449e-85c2-24c893d9761e",
"name": "Corrected for inflation",
"description": "With Avi Bitter",
"creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
"created": "2019-10-27 15:29:16.070904"
},
{
"id": "3e5601e2-1b39-43b6-b197-8040cc10036e",
"package_id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
"package_revision_id": "e30ba6a8-d453-4395-8ee5-3aa2f1ca9e1f",
"name": "Version 1.0",
"description": "Added another resource with index of countries",
"creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
"created": "2019-10-27 15:24:25.248153"
}
]
}
Show info about a specific dataset release.
Note that this will show the release information - not the dataset metadata or
data (see package_show_release
_)
HTTP Method: GET
Query Parameters:
id=<dataset_release_id>
- The UUID of the release to show (required)
Example:
$ curl -H "Authorization: $API_KEY" \
https://ckan.example.com/api/3/action/dataset_release_show?id=5942ab7a-67cb-426c-ad99-dd4519530bc7
{
"help": "http://ckan.example.com/api/3/action/help_show?name=dataset_release_show",
"success": true,
"result": {
"id": "5942ab7a-67cb-426c-ad99-dd4519530bc7",
"package_id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
"package_revision_id": "7316fb6c-07e7-43b7-ade8-ac26c5693e6d",
"name": "Version 1.2",
"description": "Updated to include latest study results",
"creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
"created": "2019-10-27 15:29:53.452833"
}
}
Create a new release for the specified dataset current revision. You are required to specify a name for the release, and can optionally specify a description.
HTTP Method: POST
JSON Parameters:
dataset=<dataset_id>
- UUID or name of the dataset (required, string)name
=<release_name>`` - Name for the release. Release names must be unique per dataset (required, string)description=<description>
- Long description for the release; Can be markdown formatted (optional, string)
Example:
$ curl -H "Authorization: $API_KEY" \
-H "Content-type: application/json" \
-X POST \
https://ckan.example.com/api/3/action/dataset_release_create \
-d '{"dataset":"3b5a4f83-8770-4e8c-9630-c8abf6aa20f4", "name": "Version 1.3", "description": "With extra Awesome Sauce"}'
{
"help": "https://ckan.example.com/api/3/action/help_show?name=dataset_release_create",
"success": true,
"result": {
"id": "e1a77b78-dfaf-4c05-a261-ff01af10d601",
"package_id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
"package_revision_id": "96ad6e02-99cf-4598-ab10-ea80e864e505",
"name": "Version 1.3",
"description": "With extra Awesome Sauce",
"creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
"created": "2019-10-28 08:14:01.953796"
}
}
Delete a dataset release. This does not delete the dataset revision, just the named release pointing to it.
HTTP Method: POST
JSON Parameters:
id=<dataset_release_id>
- The UUID of the release to delete (required, string)
Example::
$ curl -H "Authorization: $API_KEY" \
-H "Content-type: application/json" \
-X POST \
https://ckan.example.com/api/3/action/dataset_release_delete \
-d '{"id":"e1a77b78-dfaf-4c05-a261-ff01af10d601"}'
{
"help": "https://ckan.example.com/api/3/action/help_show?name=dataset_release_delete",
"success": true,
"result": null
}
Show a dataset (AKA package) in a given release. This is identical to the
built-in package_show
action, but shows dataset metadata for a given
release, and adds some versioning related metadata.
This is useful if you've used dataset_release_list
to get all
named releases for a dataset, and now want to show that dataset in a specific
release.
If release_id
is not specified, the latet release of the dataset will be
returned, but will include a list of releases for the dataset.
HTTP Method: GET
Query Parameters:
id=<dataset_id>
- The name or UUID of the dataset (required)release_id=<release_id>
- A release name to show (optional)
Examples:
Fetching dataset metadata in a specified release:
$ curl -H "Authorization: $API_KEY" \
'https://ckan.example.com/api/3/action/package_show_release?id=3b5a4f83-8770-4e8c-9630-c8abf6aa20f4&release_id=5942ab7a-67cb-426c-ad99-dd4519530bc7'
{
"help": "https://ckan.example.com/api/3/action/help_show?name=package_show_release",
"success": true,
"result": {
"maintainer": "Bob Paulson",
"relationships_as_object": [],
"private": true,
"maintainer_email": "",
"num_releases": 2,
"release_metadata": {
"id": "5942ab7a-67cb-426c-ad99-dd4519530bc7",
"package_id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
"package_revision_id": "7316fb6c-07e7-43b7-ade8-ac26c5693e6d",
"name": "Version 1.2",
"description": "Without Avi Bitter",
"creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
"created": "2019-10-27 15:29:53.452833"
},
"id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
"metadata_created": "2019-10-27T15:23:50.612130",
"owner_org": "68f832f7-5952-4cac-8803-4af55c021ccd",
"metadata_modified": "2019-10-27T20:14:42.564886",
"author": "Joe Bloggs",
"author_email": "",
"state": "active",
"version": "1.0",
"type": "dataset",
"resources": [
{
"cache_last_updated": null,
"cache_url": null,
"mimetype_inner": null,
/// ... standard resource attributes ...
}
],
"num_resources": 1,
/// ... more standard dataset attributes ...
}
}
Note the release_metadata
, which is only included with dataset metadata if
the release_id
parameter was provided.
Fetching the current revision of dataset metadata in a specified release:
{
"help": "https://ckan.example.com/api/3/action/help_show?name=package_show_release",
"success": true,
"result": {
"license_title": "Green",
"relationships_as_object": [],
"private": true,
"id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
"metadata_created": "2019-10-27T15:23:50.612130",
"metadata_modified": "2019-10-27T20:14:42.564886",
"author": "Joe Bloggs",
"author_email": "",
"state": "active",
"release": "1.0",
"creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
"type": "dataset",
"resources": [
{
"mimetype": "text/csv",
"cache_url": null,
"hash": "",
"description": "",
"name": "https://data.example.com/dataset/287f7e34-7675-49a9-90bd-7c6a8b55698e/resource.csv",
"format": "CSV",
/// ... standard resource attributes ...
}
],
"num_resources": 1,
"releases": [
{
"vocabulary_id": null,
"state": "active",
"display_name": "bar",
"id": "686198e2-7b9c-4986-bb19-3cf74cfe2552",
"name": "bar"
},
{
"vocabulary_id": null,
"state": "active",
"display_name": "foo",
"id": "82259424-aec6-428c-a682-0b3f6b8ee67d",
"name": "foo"
}
],
"releases": [
{
"id": "5942ab7a-67cb-426c-ad99-dd4519530bc7",
"package_id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
"package_revision_id": "7316fb6c-07e7-43b7-ade8-ac26c5693e6d",
"name": "Version 1.2",
"description": "Fixed some inaccuracies in data",
"creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
"created": "2019-10-27 15:29:53.452833"
},
{
"id": "87d6f58a-a899-4f2d-88a4-c22e9e1e5dfb",
"package_id": "3b5a4f83-8770-4e8c-9630-c8abf6aa20f4",
"package_revision_id": "1b9fc99e-8e32-449e-85c2-24c893d9761e",
"name": "version 1.1",
"description": "Adjusted for country-specific inflation",
"creator_user_id": "70587302-6a93-4c0a-bb3e-4d64c0b7c213",
"created": "2019-10-27 15:29:16.070904"
}
],
/// ... more standard dataset attributes ...
}
}
Note the releases
list, only included when showing the latest
dataset release via package_show_release
.
This extension does not provide any additional configuration settings.
To install ckanext-versioning for development, activate your CKAN virtualenv and do:
git clone https://github.com/datopian/ckanext-versioning.git
cd ckanext-versioning
python setup.py develop
pip install -r dev-requirements.txt
To run the tests, do:
make test
make test TEST_PATH=test_file.py # to run all the tests of a specific file.
make test TEST_PATH=test_file.py:Class # to run all the tests of a specific Class.
make test TEST_PATH=test_file.py:Class.test_name # to execute a specific test.
To run the tests and produce a coverage report, first make sure you have
coverage installed in your virtualenv (pip install coverage
) then run:
make test coverage
Note that for tests to run properly, you need to have this extension installed in an environment that has CKAN installed in it, and configured to access a local PostgreSQL and Solr instances.
You can specify the path to your local CKAN installation by adding:
make test CKAN_PATH=../../src/ckan/
For example.
In addition, the following environment variables are useful when testing:
CKAN_SQLALCHEMY_URL=postgres://ckan:ckan@my-postgres-db/ckan_test
CKAN_SOLR_URL=http://my-solr-instance:8983/solr/ckan