diff --git a/component_scope_kingfisher.ipynb b/component_scope_kingfisher.ipynb index 7d37953..576bf79 100644 --- a/component_scope_kingfisher.ipynb +++ b/component_scope_kingfisher.ipynb @@ -1,433 +1,435 @@ { - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [], - "toc_visible": true - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - } + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "toc_visible": true }, - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "73ktdYktAVoO" - }, - "source": [ - "## Check scope" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "T4Gb2g11BXwt" - }, - "source": [ - "\n", - "\n", - "Use this section to check:\n", - "\n", - "* how many releases, records and compiled releases your data contains\n", - "* what stages of the contracting process your data covers\n", - "* what date range your data covers\n", - "\n", - "If you are preparing an [Ad-hoc structure and format feedback](https://docs.google.com/document/d/1_k7eA2rI-k5EH8VESkVAB73wa_qrpplL-7dKgMLTGZc/edit#heading=h.i7tpu8c49dcv), you might skip this section." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ycCwGfGkA6au" - }, - "source": [ - "### Release and record counts" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "s-i5qCmvCsHA" - }, - "source": [ - "Collections in Kingfisher Process contain either [releases](https://standard.open-contracting.org/latest/en/schema/reference/), [records](https://standard.open-contracting.org/latest/en/schema/records_reference/) or [compiled releases](https://standard.open-contracting.org/latest/en/schema/records_reference/#compiled-release). Kingfisher Process creates compiled release collections from release or record collections.\n", - "\n", - "Use this section to check that the data contains the expected number of releases, records and compiled releases. Where possible, you should check these numbers against the total number of results available in the frontend of the data source.\n", - "\n", - "Count the number of releases, records and compiled releases, for each collection.\n", - "\n", - "**Note:** These columns are not yet populated in version 2 of Kingfisher Process. Comment on [this issue](https://github.com/open-contracting/kingfisher-process/issues/370) to prioritize it." - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "MVJ6sIWeAZzD" - }, - "source": [ - "%%sql\n", - "\n", - "SELECT\n", - " id AS collection_id,\n", - " cached_releases_count AS releases_count,\n", - " cached_records_count AS records_count,\n", - " cached_compiled_releases_count AS compiled_releases_count\n", - "FROM\n", - " collection\n", - "WHERE\n", - " id IN :collection_ids\n" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "HD9LE4SLCzPg" - }, - "source": [ - "### Contracting process stages" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "o5hZPAo5yHMm" - }, - "source": [ - "Use this section to check that the data covers the expected stages of the contracting process." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Tfq5haelyH3E" - }, - "source": [ - "#### Release tags" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "cTZiHSkvC3__" - }, - "source": [ - "[Release tags](https://standard.open-contracting.org/latest/en/schema/codelists/#release-tag) indicate the stage of a contracting process to which a release is related." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "EqsaSZj6DkuS" - }, - "source": [ - "Count the number of releases, for each release tag:" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "WwgeZMX5Ame4" - }, - "source": [ - "%%sql\n", - "\n", - "SELECT\n", - " collection_id,\n", - " release_type,\n", - " tag,\n", - " count(*)\n", - "FROM\n", - " release_summary\n", - "GROUP BY\n", - " collection_id,\n", - " release_type,\n", - " tag\n", - "ORDER BY\n", - " collection_id\n" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "vTzQeN_kyJ_E" - }, - "source": [ - "#### Objects per stage" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8EmYSnoDnOmT" - }, - "source": [ - "In OCDS, data is organized into objects, for each stage of a contracting process. Each compiled release has: at most one `Planning` object, at most one `Tender` object, any number of `Award` objects, and any number of `Contract` objects. Each `Contract` object has at most one `Implementation` object. As such, the number of `Award` objects can exceed the number of unique OCIDs, but the number of `Tender` objects can't." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "DtRec1tNnPz6" - }, - "source": [ - "Plot a count of objects per stage:" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "Oe8W_HLCKUsO" - }, - "source": [ - "%%sql objects_per_stage <<\n", - "\n", - "SELECT\n", - " CASE WHEN paths.path = 'contracts/implementation' THEN\n", - " 'implementation'\n", - " ELSE\n", - " paths.path\n", - " END AS stage,\n", - " CASE WHEN paths.path IN ('planning', 'tender', 'contracts/implementation') THEN\n", - " GREATEST (object_property, 0)\n", - " ELSE\n", - " GREATEST (array_count, 0)\n", - " END AS object_count\n", - "FROM (\n", - " SELECT\n", - " unnest(ARRAY['planning', 'tender', 'awards', 'contracts', 'contracts/implementation']) AS path) AS paths\n", - " LEFT JOIN (\n", - " SELECT\n", - " *\n", - " FROM\n", - " field_counts\n", - " WHERE\n", - " collection_id IN :collection_ids\n", - " AND release_type = 'compiled_release'\n", - " AND path IN ('planning', 'tender', 'awards', 'contracts', 'contracts/implementation')) AS field_counts USING (path)\n" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "id": "mKo6Q4HimvQZ" - }, - "source": [ - "plot_objects_per_stage(objects_per_stage)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "9Ui1BMMBFgGu" - }, - "source": [ - "### Date ranges" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ybx8YrW7hRWC" - }, - "source": [ - "\n", - "Use this section to check that the data covers the expected date range." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "V5RSYoTSHRZE" - }, - "source": [ - "Calculate the earliest and latest `date`, `awards/date` and `contracts/dateSigned`:" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "qy0VRQ3IDwsG" - }, - "source": [ - "%%sql\n", - "\n", - "SELECT\n", - " collection_id,\n", - " release_type,\n", - " 'release_date' AS date_type,\n", - " min(date) AS min,\n", - " max(date) AS max\n", - "FROM\n", - " release_summary\n", - "GROUP BY\n", - " collection_id,\n", - " release_type,\n", - " date_type\n", - "UNION ALL\n", - "SELECT\n", - " collection_id,\n", - " release_type,\n", - " 'award_date' AS date_type,\n", - " min(first_award_date) AS min,\n", - " max(last_award_date) AS max\n", - "FROM\n", - " release_summary\n", - "GROUP BY\n", - " collection_id,\n", - " release_type,\n", - " date_type\n", - "UNION ALL\n", - "SELECT\n", - " collection_id,\n", - " release_type,\n", - " 'contract_datesigned' AS date_type,\n", - " min(first_contract_datesigned) AS min,\n", - " max(last_contract_datesigned) AS max\n", - "FROM\n", - " release_summary\n", - "GROUP BY\n", - " collection_id,\n", - " release_type\n", - "ORDER BY\n", - " collection_id,\n", - " release_type,\n", - " date_type;\n", - "\n" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "daHiVoJkexWi" - }, - "source": [ - "### Release date distribution\n", - "\n", - "Use this section to check that releases are distributed as expected." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "DynLHB_12cZ3" - }, - "source": [ - "Plot the count of releases per month:" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "0xTqCh28LOQj" - }, - "source": [ - "%%sql release_dates <<\n", - "\n", - "SELECT\n", - " collection_id::text,\n", - " release_type,\n", - " date,\n", - " count(*) AS release_count\n", - "FROM\n", - " release_summary rs\n", - "WHERE\n", - " collection_id IN :collection_ids\n", - "GROUP BY\n", - " collection_id,\n", - " release_type,\n", - " date\n", - "ORDER BY\n", - " date ASC;\n", - "\n" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "id": "OM2aiiJzAmzM" - }, - "source": [ - "# Resample by month\n", - "release_dates['date'] = release_dates['date'].dt.strftime('%Y-%m')\n", - "release_dates = release_dates.groupby(['collection_id', 'release_type', 'date']).agg({'release_count': 'sum'}).reset_index()\n", - "\n", - "plot_releases_by_month(release_dates)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ugoDOjsdkLXN" - }, - "source": [ - "### Extensions" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "9cxNPLXN8wwc" - }, - "source": [ - "Use this section to check which extensions the data uses." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "U7iYgIdW8z1c" - }, - "source": [ - "List the extensions declared in the package metadata:" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "qFbWPY5Eq7fv" - }, - "source": [ - "%%sql\n", - "\n", - "SELECT\n", - " collection_id,\n", - " release_type,\n", - " jsonb_array_elements(package_data -> 'extensions') AS ocds_extension,\n", - " count(*) AS count\n", - "FROM\n", - " release_summary\n", - "WHERE\n", - " collection_id IN :collection_ids\n", - " AND package_data IS NOT NULL\n", - "GROUP BY\n", - " collection_id,\n", - " release_type,\n", - " ocds_extension\n", - "ORDER BY\n", - " collection_id,\n", - " release_type,\n", - " count DESC;\n", - "\n" - ], - "execution_count": null, - "outputs": [] - } - ] + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "73ktdYktAVoO" + }, + "source": [ + "## Check scope" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "T4Gb2g11BXwt" + }, + "source": [ + "\n", + "\n", + "Use this section to check:\n", + "\n", + "* how many releases, records and compiled releases your data contains\n", + "* what stages of the contracting process your data covers\n", + "* what date range your data covers\n", + "\n", + "If you are preparing an [Ad-hoc structure and format feedback](https://docs.google.com/document/d/1_k7eA2rI-k5EH8VESkVAB73wa_qrpplL-7dKgMLTGZc/edit#heading=h.i7tpu8c49dcv), you might skip this section." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ycCwGfGkA6au" + }, + "source": [ + "### Release and record counts" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "s-i5qCmvCsHA" + }, + "source": [ + "Collections in Kingfisher Process contain either [releases](https://standard.open-contracting.org/latest/en/schema/reference/), [records](https://standard.open-contracting.org/latest/en/schema/records_reference/) or [compiled releases](https://standard.open-contracting.org/latest/en/schema/records_reference/#compiled-release). Kingfisher Process creates compiled release collections from release or record collections.\n", + "\n", + "Use this section to check that the data contains the expected number of releases, records and compiled releases. Where possible, you should check these numbers against the total number of results available in the frontend of the data source.\n", + "\n", + "Count the number of releases, records and compiled releases, for each collection.\n", + "\n", + "**Note:** These columns are not yet populated in version 2 of Kingfisher Process. Comment on [this issue](https://github.com/open-contracting/kingfisher-process/issues/370) to prioritize it." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "MVJ6sIWeAZzD" + }, + "source": [ + "%%sql\n", + "\n", + "SELECT\n", + " id AS collection_id,\n", + " cached_releases_count AS releases_count,\n", + " cached_records_count AS records_count,\n", + " cached_compiled_releases_count AS compiled_releases_count\n", + "FROM\n", + " collection\n", + "WHERE\n", + " id IN :collection_ids\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HD9LE4SLCzPg" + }, + "source": [ + "### Contracting process stages" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "o5hZPAo5yHMm" + }, + "source": [ + "Use this section to check that the data covers the expected stages of the contracting process." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Tfq5haelyH3E" + }, + "source": [ + "#### Release tags" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cTZiHSkvC3__" + }, + "source": [ + "[Release tags](https://standard.open-contracting.org/latest/en/schema/codelists/#release-tag) indicate the stage of a contracting process to which a release is related." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EqsaSZj6DkuS" + }, + "source": [ + "Count the number of releases, for each release tag:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "WwgeZMX5Ame4" + }, + "source": [ + "%%sql\n", + "\n", + "SELECT\n", + " collection_id,\n", + " release_type,\n", + " tag,\n", + " count(*)\n", + "FROM\n", + " release_summary\n", + "GROUP BY\n", + " collection_id,\n", + " release_type,\n", + " tag\n", + "ORDER BY\n", + " collection_id\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vTzQeN_kyJ_E" + }, + "source": [ + "#### Objects per stage" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8EmYSnoDnOmT" + }, + "source": [ + "In OCDS, data is organized into objects, for each stage of a contracting process. Each compiled release has: at most one `Planning` object, at most one `Tender` object, any number of `Award` objects, and any number of `Contract` objects. Each `Contract` object has at most one `Implementation` object. As such, the number of `Award` objects can exceed the number of unique OCIDs, but the number of `Tender` objects can't." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DtRec1tNnPz6" + }, + "source": [ + "Plot a count of objects per stage:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Oe8W_HLCKUsO" + }, + "source": [ + "%%sql objects_per_stage <<\n", + "\n", + "SELECT\n", + " CASE WHEN paths.path = 'contracts/implementation' THEN\n", + " 'implementation'\n", + " ELSE\n", + " paths.path\n", + " END AS stage,\n", + " CASE WHEN paths.path IN ('planning', 'tender', 'contracts/implementation') THEN\n", + " GREATEST (object_property, 0)\n", + " ELSE\n", + " GREATEST (array_count, 0)\n", + " END AS object_count\n", + "FROM (\n", + " SELECT\n", + " unnest(ARRAY['planning', 'tender', 'awards', 'contracts', 'contracts/implementation']) AS path) AS paths\n", + " LEFT JOIN (\n", + " SELECT\n", + " *\n", + " FROM\n", + " field_counts\n", + " WHERE\n", + " collection_id IN :collection_ids\n", + " AND release_type = 'compiled_release'\n", + " AND path IN ('planning', 'tender', 'awards', 'contracts', 'contracts/implementation')) AS field_counts USING (path)\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "mKo6Q4HimvQZ" + }, + "source": [ + "plot_objects_per_stage(objects_per_stage)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9Ui1BMMBFgGu" + }, + "source": [ + "### Date ranges" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ybx8YrW7hRWC" + }, + "source": [ + "\n", + "Use this section to check that the data covers the expected date range." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "V5RSYoTSHRZE" + }, + "source": [ + "Calculate the earliest and latest `date`, `awards/date` and `contracts/dateSigned`:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "qy0VRQ3IDwsG" + }, + "source": [ + "%%sql\n", + "\n", + "SELECT\n", + " collection_id,\n", + " release_type,\n", + " 'release_date' AS date_type,\n", + " min(date) AS min,\n", + " max(date) AS max\n", + "FROM\n", + " release_summary\n", + "GROUP BY\n", + " collection_id,\n", + " release_type,\n", + " date_type\n", + "UNION ALL\n", + "SELECT\n", + " collection_id,\n", + " release_type,\n", + " 'award_date' AS date_type,\n", + " min(first_award_date) AS min,\n", + " max(last_award_date) AS max\n", + "FROM\n", + " release_summary\n", + "GROUP BY\n", + " collection_id,\n", + " release_type,\n", + " date_type\n", + "UNION ALL\n", + "SELECT\n", + " collection_id,\n", + " release_type,\n", + " 'contract_datesigned' AS date_type,\n", + " min(first_contract_datesigned) AS min,\n", + " max(last_contract_datesigned) AS max\n", + "FROM\n", + " release_summary\n", + "GROUP BY\n", + " collection_id,\n", + " release_type\n", + "ORDER BY\n", + " collection_id,\n", + " release_type,\n", + " date_type;\n", + "\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "daHiVoJkexWi" + }, + "source": [ + "### Release date distribution\n", + "\n", + "Use this section to check that releases are distributed as expected." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DynLHB_12cZ3" + }, + "source": [ + "Plot the count of releases per month:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "0xTqCh28LOQj" + }, + "source": [ + "%%sql release_dates <<\n", + "\n", + "SELECT\n", + " collection_id::text,\n", + " release_type,\n", + " date,\n", + " count(*) AS release_count\n", + "FROM\n", + " release_summary rs\n", + "WHERE\n", + " collection_id IN :collection_ids\n", + "GROUP BY\n", + " collection_id,\n", + " release_type,\n", + " date\n", + "ORDER BY\n", + " date ASC;\n", + "\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "OM2aiiJzAmzM" + }, + "source": [ + "# Resample by month\n", + "release_dates[\"date\"] = release_dates[\"date\"].dt.strftime(\"%Y-%m\")\n", + "release_dates = (\n", + " release_dates.groupby([\"collection_id\", \"release_type\", \"date\"]).agg({\"release_count\": \"sum\"}).reset_index()\n", + ")\n", + "\n", + "plot_releases_by_month(release_dates)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ugoDOjsdkLXN" + }, + "source": [ + "### Extensions" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9cxNPLXN8wwc" + }, + "source": [ + "Use this section to check which extensions the data uses." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U7iYgIdW8z1c" + }, + "source": [ + "List the extensions declared in the package metadata:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "qFbWPY5Eq7fv" + }, + "source": [ + "%%sql\n", + "\n", + "SELECT\n", + " collection_id,\n", + " release_type,\n", + " jsonb_array_elements(package_data -> 'extensions') AS ocds_extension,\n", + " count(*) AS count\n", + "FROM\n", + " release_summary\n", + "WHERE\n", + " collection_id IN :collection_ids\n", + " AND package_data IS NOT NULL\n", + "GROUP BY\n", + " collection_id,\n", + " release_type,\n", + " ocds_extension\n", + "ORDER BY\n", + " collection_id,\n", + " release_type,\n", + " count DESC;\n", + "\n" + ], + "execution_count": null, + "outputs": [] + } + ] } \ No newline at end of file